Ashwin Srinath
Ashwin Srinath
Yes -- that should be OK for cuDF. I also like `multi_quantile` or `quantile_table` over `quantiles`
FWIW, there is a "Pandas compatible" way to do this today: https://github.com/rapidsai/cudf/issues/10967#issuecomment-1138590222. But I'd agree that a more explicit API would be desirable. I wouldn't have any objections to adding...
> It's not clear to me where the name "leaves" came from. To align with PyArrow, we would rename "leaves" to Series.list.values. Note that `values` are distinct from `leaves`: The...
> Would you consider exposing both list.values and list.leaves? It seems important to have a way to un-nest one level at a time (like with list.offsets). Again, while I'm not...
Right, which is why I'm suggesting a distinct `DataFrame.list.eval` API (note the namespace).
I agree - let's move the discussion relating to `eval` elsewhere. My broader point though is that we shouldn't require the user to know or care about `.values` and `.offsets`...
Then `list.bucketize()` is an API we may want to consider adding, rather than having each user write their own version of it.
Hi @shaswat-indian - thanks for taking this on! The approach taken here could be improved, particularly by relying less on Pandas internals and relying instead on cuDF's internals more. I'd...
Great work so far! Could you also please add a benchmark for each of the functions being introduced here? The guidelines for writing benchmarks are still being written up [here](https://github.com/rapidsai/cudf/pull/11122).
In the meantime, I would be very curious even if you posted informal benchmarks, using something like `%timeit` — how much faster are we compared to Pandas (and how does...