openml-python
openml-python copied to clipboard
Small suggestion for compatibility with AutoSklearn
The openml/runs/functions.py file contains the code below, which retrieves the list of unique classes used inside the best estimator of the BaseSearchCV. I believe this list is often just np.unique(y), but some classifiers may use a different method.
if isinstance(used_estimator, sklearn.model_selection._search.BaseSearchCV):
model_classes = used_estimator.best_estimator_.classes_
else:
model_classes = used_estimator.classes_
Since Sklearn 0.19, the class BaseSearchCV has a classes_ property. I propose checking that property first, with a fallback that uses the old way. The resulting code would look like this:
if isinstance(used_estimator, sklearn.model_selection._search.BaseSearchCV):
# Since Sklearn 0.19.1, BaseSearchCV has a classes_ property.
# Checking this property first, allows for overriding this property.
if hasattr(used_estimator, "classes_"):
model_classes = used_estimator.classes_
else:
model_classes = used_estimator.best_estimator_.classes_
else:
model_classes = used_estimator.classes_
This would make it easier and more correct to write a compatibility wrapper for AutoSklearn, which does not have a best_estimator_ (because it uses a weighted ensemble instead)
So used_estimator.best_estimator_.classes_ would not work in my case, but I can still modify the used_estimator.classes_ property of the wrapper itself to return np.unique(y) where y is passed to the fit() method of the wrapper (or get them in another way).
I suppose this would also be useful for users that want to write their own extension of BaseSearchCV, or need to create a wrapper too.
Hmm... I can also set self.best_estimator_ = self. Then I can just define the classes_ property myself.
Great idea. Do you want to create a pull request for this repository and open an issue over at auto-sklearn? Thanks!