sklearn-pandas icon indicating copy to clipboard operation
sklearn-pandas copied to clipboard

Column naming: compatibility with OneHotEncoder

Open stacymiller opened this issue 3 years ago • 2 comments

In sklearn v0.24.1 OneHotEncoder transformer exposes derived names in the categories_ attribute. Can we add one more check to https://github.com/scikit-learn-contrib/sklearn-pandas/blob/e84274643369fc6f75ca4b1b08824e188e96cd3f/sklearn_pandas/dataframe_mapper.py#L40 to cover this case?

stacymiller avatar Mar 03 '21 23:03 stacymiller

Sure, can you create a MR and add a unit test. I will be happy to merge it.

ragrawal avatar May 08 '21 08:05 ragrawal

The categories_ attribute does not represent the derived feature names. It actually contains The categories of each feature determined during fitting, see OneHotEncoder.categories_).

Nonetheless, sklearn 1.0 transformer's get_output_names is getting deprecated in favor of get_feature_names_out. More info in PR #248.

falcaopetri avatar Oct 17 '21 18:10 falcaopetri