LeoGrin comments

Results 17 comments of


                                            LeoGrin

Column-wise parallelism for Column Transformer

I understand, but I think which option is the fastest is not obvious for univariate transformers. Quoting @glemaitre: > To give a concrete example, I think it would be faster...

Column-wise parallelism for Column Transformer

I agree, though I think I should explain more where I'm coming from for my first proposal (i.e summarize quickly https://github.com/skrub-data/skrub/pull/592): - if you parallelize the transformers by splitting the...

`GapEncoder` is slow

> Sorry, I also forgot to report the conclusion of my experiments. I did not find any major bottleneck in the encoder. From my experience, the gap encoder is slow...

Add test for `CHANGES.rst`

I think this can be closed.

New README proposal

LGTM

Handle id columns differently

>The only challenge would be how to differentiate between an ID column and some other high cardinality column (for instance, the population of two countries is never exactly the same...

Better threshold metric for fuzzy_join

> I would love some opinions on > which of the reference distances do we want to support, and if we want more than 1 which should be the default...