mlxtend icon indicating copy to clipboard operation
mlxtend copied to clipboard

Stacking classifier has no attribute predict_proba

Open qiagu opened this issue 6 years ago • 7 comments

Many classifiers have no attribute predict_proba, such as many linear models and the SVC family classifiers. Instead, they carry another attribute decision_function in scikit-learn's implementation. Current stacking classifiers would fail to stack non predict_proba compatible base estimators when use_proba is set to True. @rasbt Do you think it's good to add decision_function support? Does it make sense to stack decision_function scores together with predict_proba values as meta features?

https://github.com/rasbt/mlxtend/blob/master/mlxtend/classifier/stacking_classification.py#L218-L224

qiagu avatar Nov 22 '19 17:11 qiagu

Hm, good point. But are the outputs of the decision_function on the same scale (or [0, 1] range) like probabilities? I think that SVC, for example, computes the distance from the hyperplane, so wouldn't that be dataset specific? (Genuine question since I never really used decision_function)

rasbt avatar Nov 22 '19 18:11 rasbt

are the outputs of the decision_function on the same scale (or [0, 1] range) like probabilities?

Probably not. My knowledge about decision_function is very limited though.

I don't quite understand your second question. But I feel it will not interfere with other things to support decision_function when use_proba is True, right? If you agree, I can try to implement it.

qiagu avatar Nov 22 '19 18:11 qiagu

I was thinking of the EnsembleVoteClassifier where the probabilities can be averaged via soft-voting, which would be a problem if the decision_function values were on a different scale than the predict_proba values. I think in the stacking classifiers it doesn't matter though.

I don't quite understand your second question.

If it is a distance from the hyperplane, than the distance value would be 10 times larger if the training dataset is scaled by a factor of 10. I.e., the predict_proba probabilities in logistic regression would be the same whereas the SVC decision_function values would be 10 times larger. But I guess it doesn't matter in Stacking.

If you agree, I can try to implement it.

I think this makes sense then. It would be best to extend this by a parameter

  • use_decision_function which can be True or False (similar to use_proba)

This way, one can use either use_decision_function or use_proba, or both.

What do you think?

rasbt avatar Nov 22 '19 19:11 rasbt

Great point. It sounds a little more complicated than I thought before. I need to think about how to implement it and whether it's worth the effort. :)

Another predict_proba / decision_function question came to my mind. The stacking classifiers support predict_proba by outputting the predict_proba of the meta classifier. What if the meta_clf_ has no predict_proba or has both decision_function and predict_proba? When using roc_auc_scorer or average_precision_scorer, scikit-learn always tries the decision_function first (https://github.com/scikit-learn/scikit-learn/blob/1495f69242646d239d89a5713982946b8ffcf9d9/sklearn/metrics/scorer.py#L181-L188). Do you think stacking classifiers need a decision_function method?

It may be hard to mange both the methods. The picture in my mind is messy.

qiagu avatar Nov 22 '19 19:11 qiagu

It seems SVC has both decision_function and predict_proba, but sklearn's notes about the predict_proba make me scared to use.

The probability model is created using cross validation, so the results can be slightly different than those obtained by predict. Also, it will produce meaningless results on very small datasets.

It's reasonable to use decision_function instead of predict_proba to calculate roc-auc and average_precision scores, right? Frankly, the two are the most popular metrics in my research field. I should care about them.

qiagu avatar Nov 22 '19 20:11 qiagu

What if the meta_clf_ has no predict_proba ...

In that case, when calling

my stackingclassier.predict_proba, it should raise an error. Alternatively, this method (stackingclassier.predict_proba) could be removed via the constructor if the metaclassifier doesn't have a predict proba. Do you know how scikit-learn handles cases where there is no predict_proba support? Does it raise an error when calling it or does the method not exist for such cases?

or has both decision_function and predict_proba?

Imho, it would make most sense if

  • stackingclassier.predict_proba outputs the predict_proba via the metaclassifier
  • we could add an additional stackingclassier.decision_function for this case

It's reasonable to use decision_function instead of predict_proba to calculate roc-auc and average_precision scores, right?

I suppose so. Actually, using a decision threshold is more natural than working with the probabilities. I am not exactly sure how scikit-learn handles roc + decision threshold -- I am usually not using support vector machines

rasbt avatar Nov 24 '19 19:11 rasbt

Do you know how scikit-learn handles cases where there is no predict_proba support? Does it raise an error when calling it or does the method not exist for such cases?

It seems just AttributeError would be raised if calling on an estimator that has no predict_proba.

I vote for adding decision_function, with similar structure to predict_proba, for stacking classifiers. I imagine most scenarios would be handled well. When calculating roc_auc/ap using the corresponding scorer, decision_function would be called first, if the meta_clf_ has no decision_function, AttributeError would be raised and caught, then the predict_proba would be called.

For the decision_function implementation, I can have a simple try.

qiagu avatar Nov 24 '19 20:11 qiagu