modAL icon indicating copy to clipboard operation
modAL copied to clipboard

decision_function instead of predict_proba

Open lkurlandski opened this issue 3 years ago • 5 comments

Several non-probabilistic estimators, such as SVMs in particular, can be used with uncertainty sampling. Scikit-Learn estimators that support the decision_function method can be used with the closest-to-hyperplane selection algorithm [Bloodgood]. This is actually a very popular strategy in AL research and would be very easy to implement.

lkurlandski avatar Apr 21 '22 12:04 lkurlandski

if you just want to use the closest-to-hyperplane selection with a linear SVM, you could also write yourself a query strategy like:

 def closest_to_hyperplane_sampling(linearSVC_learner: BaseLearner, X_pool):
    y = linearSVC_learner.estimator.decision_function(X_pool)
    w_norm = np.linalg.norm(linearSVC_learner.estimator.coef_)
    dist = y / w_norm
    query_idx = np.argmin(np.abs(dist))
    return query_idx, X_pool[query_idx]

e-rich avatar Apr 28 '22 09:04 e-rich

This is great, thank you for the tip. Did not realize how simple it would be.

lkurlandski avatar May 01 '22 00:05 lkurlandski

I am curious why you normalize the confidence scores. I can't see how applying a linear scaling factor of (1 / w_norm) to the y vector would change the return value of the function. Is it simply to ensure all values are between 0 and 1?

lkurlandski avatar May 03 '22 00:05 lkurlandski

yes you are right. you do not need this step.

e-rich avatar May 03 '22 08:05 e-rich

Thought so. Cheers!

lkurlandski avatar May 03 '22 16:05 lkurlandski