LeoGrin
LeoGrin
I understand, but I think which option is the fastest is not obvious for univariate transformers. Quoting @glemaitre: > To give a concrete example, I think it would be faster...
I agree, though I think I should explain more where I'm coming from for my first proposal (i.e summarize quickly https://github.com/skrub-data/skrub/pull/592): - if you parallelize the transformers by splitting the...
> Sorry, I also forgot to report the conclusion of my experiments. I did not find any major bottleneck in the encoder. From my experience, the gap encoder is slow...
I think this can be closed.
LGTM
>The only challenge would be how to differentiate between an ID column and some other high cardinality column (for instance, the population of two countries is never exactly the same...
> I would love some opinions on > which of the reference distances do we want to support, and if we want more than 1 which should be the default...