Monday, February 19, 2024

SUPPORT VECTOR MACHINES IN MACHINE LEARNING/PYTHON/ARTIFICIAL INTELLIGENCE

Support vector machine

  • Types of Support Vector Machines
  • Working on Support Vector Machine
  • Support Vector Machine Terminology
  • Mathematical Intuition of Support Vector Machine
  • Popular Kernel Functions in SVM
  • Advantages of Support Vector Machine
  • Disadvantages of Support Vector Machine 

Support Vector Machine algorithms (SVMs) are a notable class of supervised learning models and techniques for regression and classification. Fundamentally, support vector machines (SVMs) are discriminative classifiers; they identify the optimal hyperplane through feature space optimization. This hyperplane is used to distinguish different groups of data points. Regression analysis, outlier detection, and a range of classification issues, both linear and nonlinear, can all be handled using support vector machines (SVMs), which are effective and incredibly adaptable.

These methods find applications in numerous fields, including handwriting identification, spam detection, text categorization, gene classification, and many more. Support vector machines (SVMs) are highly effective at handling multidimensional data with nonlinear correlations.

The primary goal of support vector machines (SVMs) is to locate the feature space hyperplane that maximizes the margin between the nearest data points from various classes. The hyperplane's dimensionality is proportional to the dataset's feature count. A hyperplane, for instance, is only a straight line in the case of two features. It becomes a plane once three properties are added. As the number of qualities increases, the hyperplane fills a higher-dimensional space, making it more challenging to comprehend and portray. SVM machine learning Python is very easy to understand therefore we study with the help of Python in this blog. In this model, we learn about the SVM algorithm and explain the support vector machine.

Example of Support Vector Machine

Let’s suppose we are farmers and we are sorting grains. Normally we would use a sieve, for sorting the grains, but what if the grains are oddly shaped or similar in size? In such type of situation, a Support Vector Machine (SVM) can be used or acts like a smart sorting machine.

SVM machine learning excels in classification-like tasks, like separating the wheat from the chaff (that bad vs good grains). It can also analyze the data points (wheat grains) based on their features like size, color, weight, etc. Let’s take grain size and texture as a feature for our case.

With the help of the SVM algorithm in machine learning, we create a separation line or hyperplane, in this feature space. This hyperplane or separation line strategically maximizes the distance between the two categories, wheat and chaff. Grains (data points) on one side of the line are classified as wheat, and points which are on the other side of the line are classified as chaff.

The key advantage of SVM model machine learning is finding the best separation line, even if the data isn’t perfectly linear. Let's suppose we have some grains that are oddly shaped and categorized as outliers. SVM can handle these by finding the best possible hyperplane even with such complexities.

Once the SVM is trained on the labeled data (wheat and chaff samples), then the SVM can classify new grains (data points) that it still doesn’t see. The SVM analyzes the features of these new grains and then assigns them into one class based on their position relative to the hyperplane.

Types of Support Vector Machines

We can divide the Support Vector Machines into 2 categories:

Linear Support Vector Machines (SVM) use datasets with linear separability, meaning that classes can be effectively discriminated using a single straight line. This is the typical result when the data points are neatly divided into two classes by a linear border. If this is the case, we build the most efficient decision border between the classes using linear support vector machine classifiers.

In contrast, non-linear support vector machines (SVM) are used when a straight line cannot be used to separate the non-linear data. Because of the way the data points are arranged, a more complex, non-linear decision boundary could be needed in certain situations to split the classes appropriately. Such non-linear data is handled by a Non-linear Support Vector Machine (SVM) classifier. These classifiers may handle data with non-linear correlations by converting the data into a higher-dimensional space where linear separation is possible using techniques like kernel functions.

How does Support Vector Machine work?

Linear SVM - One of the reasonable choices for the best hyperplane is the one that has the largest separation or margin between the two classes.

Image source original

The optimal hyperplane was determined by measuring the length of time a line is separated from the closest data point on either side. This specific type of hyperplane is called a "maximum-margin hyperplane or hard margin". Now all we have to do is select line L2 from the above figure. Let’s consider another scenario which is shown in the below image.
Image source original

In the image above, when there is just one blue dot in the red area, how do support vector machines (SVMs) operate? To handle this kind of data, the SVM establishes the maximum margin in the same way that it did for the prior data set. It then has to add a penalty each time a point goes over the margin. These kinds of circumstances are referred to as "soft margins".  Whenever the soft margin appears in the dataset, the SVM applies the formula (1/margin+Λ(∑Penalty)) in an attempt to reduce it. Hinge loss is mostly used to apply the penalty. If there is no violation, the loss is directly correlated with the violation's distance.

Non-linear SVM-   Till now we have looked only the linear SVM or linearly separable data, now we look at datasets that are not linearly separable. This type of data cannot be separated by a single straight line. Below is the image of such a type dataset.

Image source original 

By looking at the above diagram we can conclude that we cannot clearly separate the data points using a single line. Also if we have more than two classes it is impossible to separate them using a single straight line. In the above diagram, we can see that a single line cannot separate them but a circular hyperplane can separate the two classes. Therefore, we can add another coordinate Z, which gets help from X and Y in which Z = X2 + Y2. By adding a third dimension the graph can change into: -

Image source original

Because the above diagram tries to show 3-D space it looks like a parallel to the x-axis separates them. If we convert it into 2D space where Z = 1, then it can look like the below image:


Image source original

In the above diagram we can see that the dataset is separated with the help of Z coordinate where Z = 1, therefore, we get a circumference of radius 1 in the above case.

Support Vector Machine Terminology

A hyperplane serves as the decision boundary and partitions the feature space into multiple classes in Support Vector Machines (SVM). For linearly separable datasets, the hyperplane is represented by a linear equation, which is typically expressed as wx + b = 0. The variables 'w', 'x', and 'b' in this equation represent the weights, input characteristics, and bias term, respectively.

Support vectors or the data points closest to the hyperplane, are essential for determining the hyperplane and margin. The margin is the length of the support vector machines that separates them from the hyperplane. Optimizing the margin is a crucial goal of support vector machines (SVM) since a larger margin typically leads to better classification performance.

Support vector machines (SVMs) convert the input data points into a feature space with additional dimensions by using mathematical functions known as kernel functions. It is feasible to discover nonlinear decision boundaries with this adjustment even in cases where the data points aren't linearly separable in their original input space. It is often preferable to use kernels that use sigmoid, linear, polynomial, or radial basis functions (RBFs).

Support vector machines employ two types of margins: firm and soft. A hard margin hyperplane flawlessly separates the data points of various categories without misclassifications, in contrast to a soft margin hyperplane, which allows for certain violations or misclassifications—usually added to deal with outliers or incomplete separability.

In SVM, the regularization value 'C' guarantees a balance between margin optimization and misclassification penalty minimization. A larger 'C' value translates into a harsher penalty, which implies fewer misclassifications and a narrower margin.

Hinge loss is a popular loss function in support vector machines (SVMs) that is used to penalize margin violations or overclassification. SVM objective functions often include a hinge loss component along with a regularization term.

The dual problem in support vector machines (SVM) is to optimize the Lagrange multipliers associated with the support vectors. This formulation allows the use of kernel methods, resulting in more efficient computation, especially in high-dimensional environments.

Mathematical intuition of Support Vector Machine

Assume we are given a binary classification task with two labels: +1 and -1. The training dataset has X features and Y labels. For this reason, the linear hyperplane's equation can be written as:


The vector W in this case denotes the normal of the hyperplane. The axis that crosses the hyperplane is this one. In addition to the normal vector w, the hyperplane's offset from the origin is represented by the equation's parameter b.

The distance between a data point let x_i and the decision boundary can be calculated as:

Here ||w|| is the Euclidean norm of the weight vector w.

For Linear SVM classifier:

Optimization

For hard margin linear SVM classifier:


In the above equation, the target variable or label for the ith training instance is represented by the symbol ti. Ti=-1 denotes the negative occurrence (yi = 0), and Ti=1 denotes the positive occurrence (yi = 1). If we are to move forward, the decision boundary needs to match the constraint: 

For soft margin linear SVM classifier:

Dual Problem: To solve problems with support vector machines (SVMs), one must first determine which Lagrange multipliers correspond to which support vectors. We refer to this optimization issue as a dual problem. The optimal Lagrange multipliers α(i) optimize this dual objective function:

In above equation

  • α_i (alpha subscript i)is the Lagrange multiplier which is associated with the ith training sample.
  • K(x_i x_j ) (x subscript i and j) is the kernel function that computes the similarity between two samples x_i and x_j. They allow the SVM to handle nonlinear classification problems by implicitly mapping the samples with the higher-dimensional feature space.
  • ∑α_i  represents the sum of all the Lagrange multipliers

Using the support of the vectors and the optimal Lagrange multipliers, the decision boundary of the support vector machines may be characterized once the dual issue has been solved. The decision boundary is provided by the support vectors, which are those in the training set with an i higher than 0:

Popular kernel functions in SVM

By converting low-dimensional input space into higher-dimensional space, support vector machines' kernels enable them to solve separable problems that were previously thought to be non-separable. It is quite helpful for problems involving non-linear separation. All we need to do is introduce the kernel and figure out how to partition the data according to the labels or outputs it defines. The kernel does extremely complex data manipulations:

Advantages of Support Vector Machine

  • It is effective with high-dimensional data or cases.
  • Because it uses a decision function known as support vectors memory is efficient for the training of subsets.
  • We can use different kernel functions for the decision functions and it is also possible to specify the custom kernels.

Disadvantages of Support Vector Machine

  • Big data sets aren't a good fit for the Support Vector Machine method.
  • When dealing with large datasets, the Support Vector Machine technique doesn't work well.
  • Support Vector Machine is ineffective when applied to a very big dataset.
Application of support vector machine
Many different types of businesses use Support Vector Machine (SVM), a versatile machine learning method, to solve regression and classification issues. Using the feature space, SVM determines the optimum hyperplane that divides the classes best, maximizing the margin between them. This space improves the ability to generalize to new data and ensures that the model can withstand noise. In high-dimensional domains or situations with more attributes than samples, SVMs perform exceptionally well. By projecting the initial feature space into a higher-dimensional space, where classes are more easily distinguished, kernel functions enhance their ability to deal with non-linear data. Multiple fields make use of support vector machines (SVMs), such as biology, image recognition, text classification, and finance. Classification of texts makes use of support vector machines (SVMs) for applications such as document classification, sentiment analysis, and spam detection. Support vector machines (SVMs) find use in object detection, face recognition, and medical image analysis within the field of image recognition. Protein structure prediction, gene expression analysis, and drug discovery are some of the bioinformatics applications of support vector machines (SVMs). Financial institutions rely on support vector machines (SVMs) for a variety of tasks, including stock price forecasting, credit scoring, and loss prevention. With their robust and efficient solutions, SVMs improve data analysis and decision-making across many industries.

      Summary

    Support Vector Machines (SVM) serve as versatile algorithms employed across classification, regression, and outlier detection tasks. Their primary objective involves identifying the optimal hyperplane that effectively segregates different classes in a dataset by maximizing the margin between them. Operating within high-dimensional spaces, SVMs leverage diverse kernel functions to handle intricate data patterns and nonlinear relationships effectively. While SVMs offer robust performance, it's essential to acknowledge that they can pose computational challenges and demand meticulous parameter selection to achieve optimal results.

     Python Code

    Below we define support vector machine in Python code: - 


No comments:

Post a Comment

Featured Post

ASSOCIATION RULE IN MACHINE LEARNING/PYTHON/ARTIFICIAL INTELLIGENCE

Association rule   Rule Evaluation Metrics Applications of Association Rule Learning Advantages of Association Rule Mining Disadvantages of ...

Popular