Ashwin Srinath comments

Results 179 comments of


                                            Ashwin Srinath

ENH: Implement multi-column `DataFrame.quantiles`

Yes -- that should be OK for cuDF. I also like `multi_quantile` or `quantile_table` over `quantiles`

[FEA] Get Series.list offsets / Construct Series of lists from offsets and values

FWIW, there is a "Pandas compatible" way to do this today: https://github.com/rapidsai/cudf/issues/10967#issuecomment-1138590222. But I'd agree that a more explicit API would be desirable. I wouldn't have any objections to adding...

[FEA] Get Series.list offsets / Construct Series of lists from offsets and values

> It's not clear to me where the name "leaves" came from. To align with PyArrow, we would rename "leaves" to Series.list.values. Note that `values` are distinct from `leaves`: The...

[FEA] Get Series.list offsets / Construct Series of lists from offsets and values

> Would you consider exposing both list.values and list.leaves? It seems important to have a way to un-nest one level at a time (like with list.offsets). Again, while I'm not...

[FEA] Get Series.list offsets / Construct Series of lists from offsets and values

Right, which is why I'm suggesting a distinct `DataFrame.list.eval` API (note the namespace).

[FEA] Get Series.list offsets / Construct Series of lists from offsets and values

I agree - let's move the discussion relating to `eval` elsewhere. My broader point though is that we shouldn't require the user to know or care about `.values` and `.offsets`...

[FEA] Get Series.list offsets / Construct Series of lists from offsets and values

Then `list.bucketize()` is an API we may want to consider adding, rather than having each user write their own version of it.

Added 'crosstab' and 'pivot_table' features

Hi @shaswat-indian - thanks for taking this on! The approach taken here could be improved, particularly by relying less on Pandas internals and relying instead on cuDF's internals more. I'd...

Added 'crosstab' and 'pivot_table' features

Great work so far! Could you also please add a benchmark for each of the functions being introduced here? The guidelines for writing benchmarks are still being written up [here](https://github.com/rapidsai/cudf/pull/11122).

Added 'crosstab' and 'pivot_table' features

In the meantime, I would be very curious even if you posted informal benchmarks, using something like `%timeit` — how much faster are we compared to Pandas (and how does...