sklearn-pandas
sklearn-pandas copied to clipboard
Column naming: compatibility with OneHotEncoder
In sklearn v0.24.1 OneHotEncoder transformer exposes derived names in the categories_
attribute. Can we add one more check to
https://github.com/scikit-learn-contrib/sklearn-pandas/blob/e84274643369fc6f75ca4b1b08824e188e96cd3f/sklearn_pandas/dataframe_mapper.py#L40 to cover this case?
Sure, can you create a MR and add a unit test. I will be happy to merge it.
The categories_
attribute does not represent the derived feature names. It actually contains The categories of each feature determined during fitting
, see OneHotEncoder.categories_).
Nonetheless, sklearn 1.0 transformer's get_output_names
is getting deprecated in favor of get_feature_names_out
. More info in PR #248.