Gregory Kimball
Gregory Kimball
Hello @shwina and @bdice, [bucketize](https://github.com/pytorch/torcharrow/blob/main/csrc/velox/functions/rec/bucketize.h) is a feature that we might unlock if we could construct a list column from offsets and values. Bucketize is performed on leaves and uses...
To my surprise the `explode` trick from #10967 works here as well: ``` def bucketize(a, buckets): a_x = a.explode() b = a_x * 0 for k in buckets: b +=...
@rnukala1 is this issue still relevant?
With the addition of list column support for `distinct` in libcudf (#10641), this issue just needs python bindings.
In 22.06 `drop_duplicates` uses a sort-based algorithm and relies on the lexicographic comparator. We expect this will be closed by #11129 ``` import cudf df = cudf.DataFrame({"a": [[1, 3, 5,...
> API where we can send compressed pages and metadata to CUDF for decoding. The metadata would include things like the file and row group that the pages came from...
Would this be possible with `apply` instead of `agg`? Is there an extension of #11452 that could accept some custom aggregations?
Thanks @etseidl for suggesting this change. Please excuse the delay, we will be taking another look for the 22.10 release.
Please feel free to re-open if the issue is not solved. Thank you @rjzamora for your contribution.
Thank you @PointKernel and @davidwendt for investigating the missing/inconsistent GPU metrics data with nvbench. If it's not the CUPTI dependency, what is the root cause of the problem? Lowering the...