scikit-lego icon indicating copy to clipboard operation
scikit-lego copied to clipboard

[BUG] 'EstimatorTransformer' object has no attribute 'get_feature_names_out'

Open CarloLepelaars opened this issue 2 years ago • 4 comments

When calling get_feature_names_out on EstimatorTransformer or a Pipeline that contains EstimatorTransformer you will get the following error: AttributeError: 'EstimatorTransformer' object has no attribute 'get_feature_names_out'

Minimal reproducible example:

from sklego.meta import EstimatorTransformer
from sklearn.linear_model import LinearRegression
EstimatorTransformer(LinearRegression()).get_feature_names_out(None)

I thought this issue was resolved in scikit-learn >= 1.1 (get_feature_names_out Available in all Transformers release highlight), but apparently a manual implementation of get_feature_names_out is still needed for custom scikit-learn transformers.

Proposed solution sketch:

from sklearn.utils.validation import check_is_fitted
class EstimatorTransformer(TransformerMixin, MetaEstimatorMixin, BaseEstimator):
    .
    .
    .
    def fit(X, y, **kwargs):
        .
        .
        .
        # Store how many output columns estimator has
        self.output_len_ = y.shape[1] if self.multi_output_ else 1
        .
        .
    
    def get_feature_names_out(self, feature_names_out=None) -> list:
        """ 
        Get names for output of EstimatorTransformer. 
        Estimator must be fitted first before this function can be called. 
        """
        check_is_fitted(self.estimator_)
        if self.multi_output_:
            feature_names = [f"prediction_{i}" for i in range(self.output_len_)]
        else: 
            feature_names = ["prediction"]
        return feature_names
        

Happy to contribute this if you agree with the proposed solution idea. If this a general problem I'm also open to work on implementing get_feature_names_out for other transformers in scikit-lego.

CarloLepelaars avatar Sep 14 '22 13:09 CarloLepelaars

Minor ask: you can attach a language to a code-block to get syntax highlighting. Like so:

```python
import pandas as pd
```

That said. Mhm ... I'm wondering what other meta estimators will have the same issue. @CarloLepelaars I did have a quick look at the VotingClassifier in sklearn and it seems that also in sklearn not every Meta estimator has get_feature_names_out implemented all the time.

I'm also curious if scikit-learn has tests for this behavior that we can copy. @CarloLepelaars did you check the sklearn repo for this by any chance?

koaning avatar Sep 14 '22 13:09 koaning

Minor ask: you can attach a language to a code-block to get syntax highlighting.

Makes sense! Added syntax highlighting in comment above.

🤔 Interesting! Seems odd that it is implemented for VotingClassifier, but not for other estimators in the ensemble module like BaggingClassifier.

Here is an example of a get_feature_names_out test case for LDA in sklearn: https://github.com/scikit-learn/scikit-learn/blob/5bd81234e6e6501ddcddbfdfdc80b90a1302af55/sklearn/tests/test_discriminant_analysis.py#L659

CarloLepelaars avatar Sep 14 '22 14:09 CarloLepelaars

@koaning, shall I go ahead and implement this for EstimatorTransformer? After that we can evaluate if implementation is needed for other Meta estimators in sklego. I'm sure the implementation for EstimatorTransformer will give insights on the need for get_feature_names_out in other Meta estimators.

CarloLepelaars avatar Sep 27 '22 09:09 CarloLepelaars

Yes please!

koaning avatar Sep 27 '22 10:09 koaning