nltools icon indicating copy to clipboard operation
nltools copied to clipboard

Linear SVM should return true distance to hyperplane

Open ejolly opened this issue 7 years ago • 0 comments

Apparently sklearn doesn't normalize distance to hyperplane, which is needed to make it interpretable and to compare across datasets/models (not talking about probabilities at all here), because the calculation is entirely dependent on the kernel you use. A linear kernel however makes the transform easy and aids the interpretation, as it's normalized to the range 1, see here and here. The reason for this is the underlying libsvm library does not do this normalization for you automatically. This results in differences if you use libsvm in other contexts (e.g. Matlab); results may not agree.

We can return this value instead in the specific case of a linear kernel with an SVM.

svc = SVC(kernel='linear')
y = svc.decision_function(x)
w_norm = np.linalg.norm(svc.coef_)
dist = y / w_norm

ejolly avatar Nov 26 '18 22:11 ejolly