mlcourse
mlcourse copied to clipboard
More careful treatment of the link between hard and soft-margin SVM
With separable data, we seek to maximize the geometric margin, and that gives hard-margin SVM. Without separable data, need to maintain clarity on "geometric margin" is and how it connects to slack. In particular, rather than just the slack penalty perspective, can we formulate soft-margin SVM as: maximize the geometric margin, subject to a maximum total slack. Or, for a given geometric margin, find separator that minimizes slack. Christoph Lambert's Kernel Methods for Object Recognition gives a nice set of slides.
The "traditional form" of soft margin SVM that Brett gives in 3-SVM-Notes.pdf is basically the Tikhonov form of SVM. The formulations I mention above the Ivanov form and another form I forget the name of.
Might be nice to have a picture with two different soft SVM fits, showing the different margins and perhaps giving the total slack in caption? I find that my geometric intuition on how geometric margin is a regularization falls away as soon as we allow slack. maybe some pics would help :)
My feeling is the right way to show the effect of regularization in the soft margin case is to convert to Ivanov form. I will come up with a good picture or two.
These slides have a nice way of motivating max margin in hard-margin case: http://bit.ly/2jOnINC For a picture showing separation but with very small margin, they say: "Better not these, since they would be “risky” for future samples." and "Maximum-margin solution: most “stable” decision under perturbations of the input." They also have some slides trying to show the tradeoff between margin and slack for softmargin -- I don't think it's as good, but it's an attempt. Maybe it's just a matter of "what's our allowance for chasing outliers".
Let me know what you think now: https://github.com/davidrosenberg/mlcourse/blob/gh-pages/Labs/3-SVM-Notes_sol.pdf