Nicolas Hug
Nicolas Hug
If we warn maybe we should fix our warning catching mechanism first
What I would personally expect is neither 1. or 2., but rather 3. that `pipeline` passes `sample_weight` to all steps and raises an error if *any* of the estimator doesn't...
> Stretching this argument of "no good default" ... fair point Just to clarify, I think 3) is still different from the newly updated 2). In 3), all steps are...
> For instance an estimator tag fit_sample_weight For ref, I tried that in https://github.com/scikit-learn/scikit-learn/pull/13565. It wasn't that easy, but some of the issues I encountered about inheritance should be solved...
i'm getting baited into reviewing this but it doesn't seem finished yet... mark as WIP @glemaitre ? ;)
Thanks @pmeier How will we remember to revisit this before the next release?
Categories are strings, not integers. String comparison will take much longer. I did observe a significant difference last time I benchmarked this
```py import random import string str_len = 20 num_categories = 1000 batch_size = 512 categories = [''.join(random.choices(string.ascii_lowercase, k=str_len)) for _ in range(num_categories)] mapping = {cat:i for (i, cat) in enumerate(categories)}...
I don't know yet, honestly. But I'm open to removing this `categories` logic altogether. We're still in prototype stage and our main focus is loading speed right now.
We haven't pushed them yet, but they're currently at https://pytorch.org/vision/main/transforms.html @datumbox I took the liberty to fix the link in your message