Exploring Support Vector Machines
If we learn more from exceptions than we do from patterns, what keeps us from seeing a panda staring us in the eye?
The essence of Data Science isn’t looking at the data but developing our ability to analyze contextual relationships from differing dimensions. Support vector machines are an excellent example of how an algorithm can help us understand the perspective in which we look at the data, facilitate its transformation into alternate dimensions, and develop our ability to analyze realities that elude management’s naked eyes. What exactly are vector support machines, how do they work, how do they differ from other machine learning algorithms, and what are their use scenarios in data science?
Support vector machines (SVM) are supervised learning models used for classification and regression analysis, as well as in outlier detection. Support Vector Machine work by creating one or more hyperplanes that separate data classes or clusters. Although the original SVM algorithm was published as a linear classifier almost sixty years ago by Vapnik and Chervonenkis, several works since then have extended the use scenarios to unsupervised learning and non-linear classification. Their robustness, efficiency and relatively quick training times have made them one of the most widely-used clustering algorithms today.
The support vector machine identifies the coefficients that best separate data classes. The coefficients are represented by a a hyperplane that takes the form either of a line in two-dimensional space or a set of hyperplanes in multidimensional space. The hyperplane is dependent on the dot product of the features of the data under study rather than the co-ordinates of the feature. The distance between each hyperplane and the closest data points is referred to as the margin. Support vectors refer to the outer limits of the margin that separates each data class. The optimal hyperplane between two classes is the hyperplane is that one provides has the largest margin.
SVMs were originally a derivative of the Perceptron algorithm — their use was limited to data sets that could be analyzed by linear regression or classifications. In the early nineteen nineties, Boser, Guyon and Vapnik suggested applying a “kernel trick” to create nonlinear classifiers in maximum-margin hyperplanes. The kernel trick involves transforming the data using linear, polynomial, radial or sigmoid functions to transform input spaces into different feature spaces, or “higher” dimensions, of reality. At the turn of the century, Siegelmann and Vapnik extended the analysis in developing support vector clustering using the statistics of support vectors to facilitate the categorization of unlabeled data.
Support vector machines are used in a wide variety of industrial settings. The SVM algorithm has been widely applied for image recognition/classification in biological sciences. In the field of computational biology, for example, SVMs have proven to the most effective method to deal with the protein remote homology detection Elsewhere, SVMs are frequently used in text and hypertext categorization because these algorithms significantly reduce the need for labeled training sets. We also use SVMs for spatial and spatiotemporal environmental data analysis and modeling series. Finally, SVMs achieve markedly better search accuracy than traditional query refinement schemes.
Although neural networks are often used as substitutes for support vector networks, there are important differences between the two. Neural networks are inherently heuristic approaches to prediction, whereas SVMs are theoretically founded. SVMs can be trained much more quickly, and are less prone to overfitting, than their counterparts. SVMs by design establish the optimal set of parameters, where the reliance of neural networks on incremental gradient descent offers no such guarantees. Non-linear SVMs also perform neural networks when projecting into higher-dimensional space — the kernel function’s complexity doesn’t increase with the number of dimensions, whereas the complexity of neural networks increases mechanically with the number of neurons. All this said, neural networks are none-the-less preferred over SVMs when many support vectors are being created because their prediction speed is higher and model-size smaller.
The Panda example is illustrative of the difficulty we have in distinguishing outliers from generalizable patterns. Human perception and cognition often limit or predictive capacities in visual search. We study space relying on cognitive patterns (in this case the Cs between the circles) that hide the solution from the naked eye. Innovation in management, like in the arts or sciences, begins with recognizing how we perceive the data, and then changing the perspectives (dimensions) of how we look at reality.
Dr. Lee SCHLENKER - The Business Analytics Institute
Previous contributions in the BAI series on basic machine learning algorithms include Artificial Neural Networks: Man vs Machine?, Bayes’ theorem-practice makes perfect and Shark Attack — explaining the use of Poisson regression