svm-vehicle-detector
svm-vehicle-detector copied to clipboard
Target values in {-1,1} with Squared Hinge Loss
Hi,
As per the documentation of the squared hinge losses, the target labels must be in the range of {-1,1} but I don't see any point in your provided code that maps the target labels to this range as according to this line: train_labels = np.concatenate( (np.ones(pos_train.shape[0],), np.zeros(neg_train.shape[0],))) the target labels are in the range {0,1}. Can you kindly elaborate on this issue?
Thanks in advance.
Good question. The sklearn.svm.LinearSVC class doesn't care about the actual values of the labels, only that they are distinct. Internally, it encodes the labels as {0, 1} regardless of the actual values provided (or for multiclass with n classes, {0, n-1}). The SVM solver implementation takes this into account when computing the loss.
For proof of this, we can examine the source code of sklearn.svm.LinearSVC, specifically these lines, which call _fit_liblinear:
self.coef_, self.intercept_, self.n_iter_ = _fit_liblinear(
X, y, self.C, self.fit_intercept, self.intercept_scaling,
self.class_weight, self.penalty, self.dual, self.verbose,
self.max_iter, self.tol, self.random_state, self.multi_class,
self.loss, sample_weight=sample_weight)
In turn, _fit_liblinear encodes the labels as follows:
if loss not in ['epsilon_insensitive', 'squared_epsilon_insensitive']:
enc = LabelEncoder()
y_ind = enc.fit_transform(y)
classes_ = enc.classes_
Examining preprocessing.LabelEncoder.fit_transform(), we see that it calls _encode()
:
def fit_transform(self, y):
"""Fit label encoder and return encoded labels
Parameters
----------
y : array-like of shape [n_samples]
Target values.
Returns
-------
y : array-like of shape [n_samples]
"""
y = column_or_1d(y, warn=True)
self.classes_, y = _encode(y, encode=True)
return y
In turn, _encode()
calls _encode_numpy()
(if it's provided numerical values):
if values.dtype == object:
try:
res = _encode_python(values, uniques, encode)
except TypeError:
types = sorted(t.__qualname__
for t in set(type(v) for v in values))
raise TypeError("Encoders require their input to be uniformly "
f"strings or numbers. Got {types}")
return res
else:
return _encode_numpy(values, uniques, encode,
check_unknown=check_unknown)
Finally, _encode_numpy()
calls numpy.unique():
def _encode_numpy(values, uniques=None, encode=False, check_unknown=True):
# only used in _encode below, see docstring there for details
if uniques is None:
if encode:
uniques, encoded = np.unique(values, return_inverse=True)
return uniques, encoded
which will return values in {0, n-1}.