modAL
modAL copied to clipboard
decision_function instead of predict_proba
Several non-probabilistic estimators, such as SVMs in particular, can be used with uncertainty sampling. Scikit-Learn estimators that support the decision_function method can be used with the closest-to-hyperplane selection algorithm [Bloodgood]. This is actually a very popular strategy in AL research and would be very easy to implement.
if you just want to use the closest-to-hyperplane selection with a linear SVM, you could also write yourself a query strategy like:
def closest_to_hyperplane_sampling(linearSVC_learner: BaseLearner, X_pool):
y = linearSVC_learner.estimator.decision_function(X_pool)
w_norm = np.linalg.norm(linearSVC_learner.estimator.coef_)
dist = y / w_norm
query_idx = np.argmin(np.abs(dist))
return query_idx, X_pool[query_idx]
This is great, thank you for the tip. Did not realize how simple it would be.
I am curious why you normalize the confidence scores. I can't see how applying a linear scaling factor of (1 / w_norm) to the y vector would change the return value of the function. Is it simply to ensure all values are between 0 and 1?
yes you are right. you do not need this step.
Thought so. Cheers!