skrub icon indicating copy to clipboard operation
skrub copied to clipboard

Faster alternative to GapEncoder

Open jeromedockes opened this issue 1 year ago • 1 comments

Problem Description

For encoding text/high-cardinality categories, ATM we have MinHashEncoder, which only works when the downstream learner is based on decision trees, and GapEncoder, which gives high-quality representations but is very slow. It would be good to have something similar to the GapEncoder but faster, maybe a SVD or scikit-learn's NMF

Feature Description

an encoder that works similarly to GapEncoder but is faster, possibly at the cost of less interpretable topics or slightly reduced prediction performance

jeromedockes avatar Jun 13 '24 08:06 jeromedockes

related: #139

jeromedockes avatar Jun 13 '24 08:06 jeromedockes

closing in favor of #1121

jeromedockes avatar Oct 23 '24 12:10 jeromedockes