Principal Component Analysis
- Curse of Dimensionality and Dimensionality Reduction
- Common Terms and Mathematical Concepts in PCA
- Steps of PCA Algorithm
- Applications and Advantages of PCA
- Disadvantages and Limitations of PCA
The number of features or
dimensions in the dataset directly relates to the growth of computational and
time costs. Dealing with high-dimensional data often leads to overfitting and
decreased model accuracy, a phenomenon known as the "curse of
dimensionality."
This issue arises due to
the exponential growth in possible combinations as the dimensions increase,
leading to more computationally intensive operations. To overcome the effects of dimensionality, we can use many different feature engineering approaches, like feature extraction and selection.
In feature extraction, dimensionality reduction is the process of reducing the number of input properties while preserving the integrity of the original data. One
widely used method in this field is Principal Component Analysis (PCA) in machine learning.
Through orthogonal
transformation, PCA, an unsupervised learning algorithm first presented by Karl
Pearson in 1901, statistically converts correlated data into a set of linearly
uncorrelated features. PCA finds strong patterns in a dataset by lowering variances
and looking for lower-dimensional surfaces to project high-dimensional data
onto.
This method is used in
machine learning for both predictive modeling and exploratory data analysis.
Many consider it to be a more generalized kind of factor analysis, with some
parallels to regression's line of best fit. Principal component analysis (PCA)
is a useful tool for reducing the dimensionality of data while preserving
important patterns or relationships between variables. PCA machine learning does not require
prior knowledge of the target variables.
PCA evaluates the
variance of each attribute to pinpoint those with significant variations,
suggesting effective class distinctions, which in turn aids in reducing
dimensionality. Its practical uses span diverse fields such as image
analysis, movie recommendations, and optimizing resource allocation in
communication networks.
The PCA analysis algorithm is based on some mathematical concepts like:
- Variance and covariance
- Eigenvalues and Eigne factors
In fact, principal component analysis (PCA) uses a collection of
orthogonal axes called principle components to extract the data's maximum
variance. The initial component encapsulates the most substantial variance,
with successive components capturing additional orthogonal variations. PCA's
versatility extends across different fields, including data visualization,
feature extraction, and data compression, operating on the premise that vital
information lies within the variance of the features.
Unerstanding the basics of principal component analysis (PCA) can greatly enhance the performance and interpretability of convolutional neural networks (CNNs) by reducing the dimensionality of the input data while preserving its essential characteristics.
Real-World Example for Principal Component Analysis (PCA)
Let’s suppose there is a
megacity in this city Alex a data scientist lives. He found himself facing a
daunting challenge, that is the city’s transportation system was in an
immediate need of optimization to overcome the congestion and improve
efficiency. Alex has a vast amount of traffic data; he uses this data in
Principal Component Analysis (PCA) to unravel the complexities of the city’s
traffic patterns.
In the beginning, Alex
started by collecting data on various factors that influence traffic, including
vehicle counts, road conditions, weather conditions, and time of day. With
these multidimensional datasets, he applied PCA to extract the most significant
component driving traffic behavior.
As the PCA algorithm
analyzes the data Alex discovered new hidden patterns and relationships among
the different variables. He identified principal components representing key
factors such as rush hour congestion, weather-related delays, and road closures
due to accidents or construction.
Using the insights gained
from PCA, Alex developed a comprehensive traffic model that can accurately
predict congestion hotspots and potential bottlenecks across the city. By
leveraging the principal components identified through PCA, he created a
simplified yet accurate representation of the megacity's intricate traffic
dynamics.
Some Common terms used in the PCA
algorithm:
Dimensionality in a
dataset signifies the count of features or variables, denoting the number of
columns present within the data.
Correlation denotes the
relationship between two values. It measures how one value changes when the
other changes. A correlation value lies between -1 and +1, where -1 implies an
inverse relationship, +1 denotes a direct relationship, and 0 indicates no correlation.
Orthogonal signifies that
variables are unrelated or uncorrelated, resulting in a correlation of zero
between the variable pairs.
Eigenvectors are vectors
when multiplied by a square matrix ( M ), produce a new vector ( Av) that is a
scalar multiple of the original vector ( v ).
The Covariance Matrix encompasses covariances between pairs of variables.
Main Components in PCA
- Principal components are formed as linear combinations of the original features.
- These components are orthogonal, indicating a correlation of zero between variable pairs.
- The significance of each component diminishes progressively from 1 to n. PC 1 holds the highest importance, whereas the nth PC is the least significant.
- To begin, divide the dataset in half. A validation set and a training set will be created afterward.
- Logically arrange the dataset. To represent the independent variables, create a matrix with features in the columns and data items in the rows (X). To know the size of the dataset we must look at the columns’ number.
- Standardize the data to ensure consistency throughout the dataset. When a feature's variance is higher, it is considered more significant. Normalize each column to obtain the Z-matrix by dividing each data point by the column standard deviation. This will guarantee that variation has no effect on the feature's importance.
- Z's covariance matrix can be found by multiplying it by Z after transposing the Z matrix.
- Determine the eigenvalues and eigenvectors of the Z-covariance matrix. The vectors that represent the directions of the high-information axis are called eigenvectors, and the coefficients of these vectors are known as eigenvalues.
- Sort the eigenvalues in the same manner that you arranged the eigenvectors in the eigenvalue matrix P*. P*, a sorted matrix, is the result.
- Z* is the result of multiplying the P* matrix by Z to find more features. Each observation is converted into a linear combination of the original features, and the columns of Z* are autonomous.
- Select pertinent characteristics to keep and eliminate the remaining ones to do feature selection. Eliminate unnecessary elements while retaining those that are crucial.
- PCA is useful for decreasing dimensionality in many artificial intelligence applications, including computer vision and picture compression.
- It can unveil new or concealed patterns within high-dimensional data, with applications spanning fields like finance, data mining, and psychology.
In the above equation:
Step 2 – covariance
Matrix Computation
- Positive: as the x1 increases x2 also increases.
- Negative: as the x1 increases x2 decreases.
- Zeros: No direct relation.
Step 3:
Let A be a square nXn
matrix and X be a non-zero vector for which
Regarding a few scalar values o. Matrix A's eigenvalue is represented by λ, while X is the eigenvector of the same matrix for the same eigenvalue.
We can also write the equation as
- Dimensionality Reduction: PCA is renowned for reducing the number of variables in a dataset, simplifying data analysis, enhancing performance, and facilitating data visualization.
- Feature Selection: PCA can be employed for selecting essential variables from a dataset, especially beneficial in scenarios with numerous variables that are challenging to prioritize.
- Data Visualization: Utilizing PCA, high-dimensional data can be represented in two or three dimensions, aiding easier interpretation and visualization.
- Multicollinearity Management: Addressing multicollinearity issues in regression analysis is another forte of PCA, which identifies correlated variables and generates uncorrelated ones for regression modeling.
- Noise Reduction: PCA contributes to noise reduction by eliminating principal components with low variance, effectively improving the signal-to-noise ratio and unveiling underlying data structures.
- Using a smaller selection of components that represent the majority of the data variability, principal component analysis (PCA) aids in data compression. This dimensionality reduction not only lowers storage requirements but also enhances processing performance.
- Principal component analysis (PCA) can be used to identify data points that considerably depart from the norm inside the principal component space as outliers.
- Explaining the outcomes of PCA, which produces principal components as linear combinations of original variables, can be challenging when it comes to interpretation for others.
- Data Scaling: PCA's performance can be affected by data scaling. Inappropriate data scaling might undermine the effectiveness of PCA, warranting careful data scaling before its application.
- The process of reducing variables via PCA may result in information loss, which correlates with the number of principal components retained. Thus, careful selection of principal components is essential to mitigate the risk of excessive information loss.
- PCA operates under the assumption of linear relationships between variables. However, its effectiveness may decrease in situations where non-linear relationships are present, highlighting a limitation in its applicability.
- Computational Complexity: Computationally, PCA can be resource-intensive for extensive datasets, particularly when the dataset contains numerous variables.
- Overfitting can happen when a model is trained on a small dataset or with an excessive number of primary components, which might impair the model's ability to generalize to new data.