skrub icon indicating copy to clipboard operation
skrub copied to clipboard

ENH Remove the `OneHotEncoder` inheritance `SimilarityEncoder`

Open Vincent-Maladiere opened this issue 2 years ago • 6 comments

Problem Description

Follows up on https://github.com/skrub-data/skrub/pull/801

The SimilarityEncoder inherits from scikit-learn's OneHotEncoder, whose implementation might be heavy since we don't benefit from this parent class as we merely call check_X during fit.

Feature Description

Replace the inheritance with (TransformerMixin, BaseEstimator) and make the relevant small updates. This would also be the opportunity to perform some refactoring if needed.

Alternative Solutions

No response

Additional Context

No response

Vincent-Maladiere avatar Oct 31 '23 16:10 Vincent-Maladiere

also following other discussions, should this encoder be made to work on dataframes and manipulate columns by name rather than index?

jeromedockes avatar Nov 06 '23 13:11 jeromedockes

also following other discussions, should this encoder be made to work on dataframes and manipulate columns by name rather than index?

Ideally, it would work on dataframes and arrays, don't you think?

GaelVaroquaux avatar Nov 06 '23 20:11 GaelVaroquaux