Kernels are what make SVMs special. Here’s an overview from a presentation that I made recently at my workplace. I am pasting a few slides that touch on this topic.
The parameters that an SVM learns are as follows:
Dot products are central to SVM — as both the parameters (alphas) and predictions depend on dot products of pairs of data points alone.
When a linear classifier doesn’t work well, we can project the data into higher dimensions by getting x², x³ etc. In these higher dimensions, we can get a linear hyperplane to separate the classes better. These higher dimensions (x², x³ etc) can be done using feature engineering. But there’s another way to do this without manually creating these high dimension projections.
Using a kernel function on two data points, we can get all information that’s required to build a SVM classifier in higher dimension, without ever actually projecting the data in higher dimension. How neat is that?
We can even do classification in infinite dimensions!