Gael Varoquaux

Results 316 comments of Gael Varoquaux

> Would a wrapper function of some kind, doing all of this with a single command be feasible & useful? Let's start with an example. It is unclear to me...

We should probably document this option better. A pull request to the doc / docstring discussing this in a way that you find clear would be most welcomed. Thanks!

Hi @lsorber, We're prioritizing fixing this problem. It's due to the inference of which transformation gets applied to which column, and in particular the fact that the inference must be...

> if we could choose the distance to use, then using MinHash as the text encoder and "hamming" as the distance would be an approximation of 1 - Jaccard similarity,...

> I wouldn't mind working on this if folks agree. This session stuff is the the prime villain in any story about data leakage in ML pipelines. Awesome. However, I...

> do we want to have a pipeline that runs before we split train/test? At some point we will have to address this, and I think that it means that...

> I would be happy to remove the type annotations. Fine with me

> that also applies (maybe even more) to encoders, for example MinHash outputs float64 Absolutely! Thanks for raising this. Maybe we should start there

This is not a bug in TableVectorizer: it's down to the learner to handle missing values (because the strategy to handle missing values must differ depending on the learner). If...

I disagree with your desire to have an option to do it automatically: there is no good default and it tends to depend a lot on the downstream estimator. If...