Tuesday, February 20, 2024

PRINCIPAL COMPONENT ANALYSIS IN MACHINE LEARNING/PYTHON/ARTIFICIAL INTELLIGENCE

Principal Component Analysis

  • Curse of Dimensionality and Dimensionality Reduction
  • Common Terms and Mathematical Concepts in PCA
  • Steps of PCA Algorithm
  • Applications and Advantages of PCA
  • Disadvantages and Limitations of PCA

The number of features or dimensions in the dataset directly relates to the growth of computational and time costs. Dealing with high-dimensional data often leads to overfitting and decreased model accuracy, a phenomenon known as the "curse of dimensionality."

This issue arises due to the exponential growth in possible combinations as the dimensions increase, leading to more computationally intensive operations. To overcome the effects of dimensionality, we can use many different feature engineering approaches, like feature extraction and selection.

In feature extraction, dimensionality reduction is the process of reducing the number of input properties while preserving the integrity of the original data. One widely used method in this field is Principal Component Analysis (PCA) in machine learning.

Through orthogonal transformation, PCA, an unsupervised learning algorithm first presented by Karl Pearson in 1901, statistically converts correlated data into a set of linearly uncorrelated features. PCA finds strong patterns in a dataset by lowering variances and looking for lower-dimensional surfaces to project high-dimensional data onto.

This method is used in machine learning for both predictive modeling and exploratory data analysis. Many consider it to be a more generalized kind of factor analysis, with some parallels to regression's line of best fit. Principal component analysis (PCA) is a useful tool for reducing the dimensionality of data while preserving important patterns or relationships between variables. PCA machine learning does not require prior knowledge of the target variables.

PCA evaluates the variance of each attribute to pinpoint those with significant variations, suggesting effective class distinctions, which in turn aids in reducing dimensionality. Its practical uses span diverse fields such as image analysis, movie recommendations, and optimizing resource allocation in communication networks.

The PCA analysis algorithm is based on some mathematical concepts like:

  • Variance and covariance
  • Eigenvalues and Eigne factors
PCA operates by reducing the dimensionality of a dataset through the discovery of a smaller set of variables that preserves the bulk of information present in the samples. These transformed variables prove beneficial for both regression and classification tasks.


Image source original

In fact, principal component analysis (PCA) uses a collection of orthogonal axes called principle components to extract the data's maximum variance. The initial component encapsulates the most substantial variance, with successive components capturing additional orthogonal variations. PCA's versatility extends across different fields, including data visualization, feature extraction, and data compression, operating on the premise that vital information lies within the variance of the features.

Unerstanding the basics of principal component analysis (PCA) can greatly enhance the performance and interpretability of convolutional neural networks (CNNs) by reducing the dimensionality of the input data while preserving its essential characteristics.

Real-World Example for Principal Component Analysis (PCA)

Let’s suppose there is a megacity in this city Alex a data scientist lives. He found himself facing a daunting challenge, that is the city’s transportation system was in an immediate need of optimization to overcome the congestion and improve efficiency. Alex has a vast amount of traffic data; he uses this data in Principal Component Analysis (PCA) to unravel the complexities of the city’s traffic patterns.

In the beginning, Alex started by collecting data on various factors that influence traffic, including vehicle counts, road conditions, weather conditions, and time of day. With these multidimensional datasets, he applied PCA to extract the most significant component driving traffic behavior.

As the PCA algorithm analyzes the data Alex discovered new hidden patterns and relationships among the different variables. He identified principal components representing key factors such as rush hour congestion, weather-related delays, and road closures due to accidents or construction.

Using the insights gained from PCA, Alex developed a comprehensive traffic model that can accurately predict congestion hotspots and potential bottlenecks across the city. By leveraging the principal components identified through PCA, he created a simplified yet accurate representation of the megacity's intricate traffic dynamics.

Some Common terms used in the PCA algorithm:

Dimensionality in a dataset signifies the count of features or variables, denoting the number of columns present within the data.

Correlation denotes the relationship between two values. It measures how one value changes when the other changes. A correlation value lies between -1 and +1, where -1 implies an inverse relationship, +1 denotes a direct relationship, and 0 indicates no correlation.

Orthogonal signifies that variables are unrelated or uncorrelated, resulting in a correlation of zero between the variable pairs.

Eigenvectors are vectors when multiplied by a square matrix ( M ), produce a new vector ( Av) that is a scalar multiple of the original vector ( v ).

The Covariance Matrix encompasses covariances between pairs of variables.

Main Components in PCA

The resulting new features after applying PCA are termed Principal Components (PCs). These components can either match the number of original features or be fewer. Key properties of principal components include
  • Principal components are formed as linear combinations of the original features.
  • These components are orthogonal, indicating a correlation of zero between variable pairs.
  • The significance of each component diminishes progressively from 1 to n. PC 1 holds the highest importance, whereas the nth PC is the least significant.
Steps of PCA algorithm
  • To begin, divide the dataset in half. A validation set and a training set will be created afterward.
  • Logically arrange the dataset. To represent the independent variables, create a matrix with features in the columns and data items in the rows (X). To know the size of the dataset we must look at the columns’ number.
  • Standardize the data to ensure consistency throughout the dataset. When a feature's variance is higher, it is considered more significant. Normalize each column to obtain the Z-matrix by dividing each data point by the column standard deviation. This will guarantee that variation has no effect on the feature's importance.
  • Z's covariance matrix can be found by multiplying it by Z after transposing the Z matrix.
  • Determine the eigenvalues and eigenvectors of the Z-covariance matrix. The vectors that represent the directions of the high-information axis are called eigenvectors, and the coefficients of these vectors are known as eigenvalues.
  • Sort the eigenvalues in the same manner that you arranged the eigenvectors in the eigenvalue matrix P*. P*, a sorted matrix, is the result.
  • Z* is the result of multiplying the P* matrix by Z to find more features. Each observation is converted into a linear combination of the original features, and the columns of Z* are autonomous.
  • Select pertinent characteristics to keep and eliminate the remaining ones to do feature selection. Eliminate unnecessary elements while retaining those that are crucial.
Applications of Principal Component Analysis
  • PCA is useful for decreasing dimensionality in many artificial intelligence applications, including computer vision and picture compression.
  • It can unveil new or concealed patterns within high-dimensional data, with applications spanning fields like finance, data mining, and psychology.
Step-by-step mathematical explanation of PCA (Principal Component Analysis)
Step 1 – standardization
The dataset must first be normalized, which calls for all of the variables to have values between 0 and 1.

In the above equation:

μ is the mean of independent features 
σ is the standard deviation of independent features 

Step 2 – covariance Matrix Computation

Any two or more variables' covariance indicates how much they vary from one another and, therefore, serves as a measure of their total variability. We can use the following formula to find the covariance:
The value of the covariance can be positive, negative, or zeros.
  • Positive: as the x1 increases x2 also increases.
  • Negative: as the x1 increases x2 decreases.
  • Zeros: No direct relation.

Step 3: Determine the primary components by calculating the covariance matrix's eigenvalues and eigenvectors

Let A be a square nXn matrix and X be a non-zero vector for which

Regarding a few scalar values o. Matrix A's eigenvalue is represented by λ, while X is the eigenvector of the same matrix for the same eigenvalue.

We can also write the equation as

In the provided equation, matrix A and its identity matrix, matrix I, have the same structure. The conditions given above will only hold if (A-λI) is not invertible,
Using the prior equation, we can obtain the eigenvalues lambda. Using the following equation, we can calculate the relevant eigenvector.
AX= λX
Advantages of Principal Component Analysis
  • Dimensionality Reduction: PCA is renowned for reducing the number of variables in a dataset, simplifying data analysis, enhancing performance, and facilitating data visualization.
  • Feature Selection: PCA can be employed for selecting essential variables from a dataset, especially beneficial in scenarios with numerous variables that are challenging to prioritize.
  • Data Visualization: Utilizing PCA, high-dimensional data can be represented in two or three dimensions, aiding easier interpretation and visualization.
  • Multicollinearity Management: Addressing multicollinearity issues in regression analysis is another forte of PCA, which identifies correlated variables and generates uncorrelated ones for regression modeling.
  • Noise Reduction: PCA contributes to noise reduction by eliminating principal components with low variance, effectively improving the signal-to-noise ratio and unveiling underlying data structures.
  • Using a smaller selection of components that represent the majority of the data variability, principal component analysis (PCA) aids in data compression. This dimensionality reduction not only lowers storage requirements but also enhances processing performance.
  • Principal component analysis (PCA) can be used to identify data points that considerably depart from the norm inside the principal component space as outliers.
Disadvantages of Principal Component Analysis
  • Explaining the outcomes of PCA, which produces principal components as linear combinations of original variables, can be challenging when it comes to interpretation for others.
  • Data Scaling: PCA's performance can be affected by data scaling. Inappropriate data scaling might undermine the effectiveness of PCA, warranting careful data scaling before its application.
  • The process of reducing variables via PCA may result in information loss, which correlates with the number of principal components retained. Thus, careful selection of principal components is essential to mitigate the risk of excessive information loss.
  • PCA operates under the assumption of linear relationships between variables. However, its effectiveness may decrease in situations where non-linear relationships are present, highlighting a limitation in its applicability.
  • Computational Complexity: Computationally, PCA can be resource-intensive for extensive datasets, particularly when the dataset contains numerous variables.
  • Overfitting can happen when a model is trained on a small dataset or with an excessive number of primary components, which might impair the model's ability to generalize to new data.
Summary
Data dimensionality reduction is made possible by the application of Principal Component Analysis (PCA). The goal is to convert high-dimensional datasets into spaces with fewer dimensions while maintaining important information. By locating new orthogonal axes, known as principle components, that represent the largest variation in the dataset, it finds the most significant patterns in the data. PCA facilitates noise reduction, streamlines data representation, and expedites machine learning methods. But it makes the assumption that variables have linear relationships, therefore it might not work as well as it could in some nonlinear datasets.

Python Code
Below is the PCA in Python code: -


No comments:

Post a Comment

Featured Post

ASSOCIATION RULE IN MACHINE LEARNING/PYTHON/ARTIFICIAL INTELLIGENCE

Association rule   Rule Evaluation Metrics Applications of Association Rule Learning Advantages of Association Rule Mining Disadvantages of ...

Popular