himalaya icon indicating copy to clipboard operation
himalaya copied to clipboard

Newer versions of sklearn require better data validation and extra estimator tags

Open mvdoc opened this issue 9 months ago • 0 comments

I implemented a similar fix in the voxelwise tutorials (https://github.com/gallantlab/voxelwise_tutorials/pull/34). Here we probably want to make it backward compatible to allow versions of sklearn < 1.6

FAILED himalaya/ridge/tests/test_sklearn_api_ridge.py::test_check_estimator[torch-GroupRidgeCV_()-check_estimator_tags_renamed] - TypeError: Estimator GroupRidgeCV_ has defined either `_more_tags` or `_get_tags`, but not `__sklearn_tags__`. If you're customizing tags, and need to support multiple scikit-learn versions, you can implement both `__sklearn_tags__` and `_more_tags` or `_get_tags`. This change was introduced in scikit-learn=1.6
FAILED himalaya/ridge/tests/test_sklearn_api_ridge.py::test_check_estimator[torch-GroupRidgeCV_()-check_n_features_in_after_fitting] - AssertionError: `GroupRidgeCV_.predict()` does not check for consistency between input number
of features with GroupRidgeCV_.fit(), via the `n_features_in_` attribute.
You might want to use `sklearn.utils.validation.validate_data` instead
of `check_array` in `GroupRidgeCV_.fit()` and GroupRidgeCV_.predict()`. This can be done
like the following:
from sklearn.utils.validation import validate_data
...
class MyEstimator(BaseEstimator):
    ...
    def fit(self, X, y):
        X, y = validate_data(self, X, y, ...)
        ...
        return self
    ...
    def predict(self, X):
        X = validate_data(self, X, ..., reset=False)
        ...
    return X
= 56 failed, 1434 passed, 1612 skipped, 6089 warnings, 112 rerun in 105.74s (0:01:45) =

mvdoc avatar Mar 10 '25 14:03 mvdoc