Benjamin Zaitlen

Results 198 comments of Benjamin Zaitlen

@TomAugspurger this is an attempt at trying to get multi-column sorting working in Dask which requires a multi-column quantile

@shwina, do you think we could alias `quantiles` in cuDF if we do end up changing the name here ?

I'm asking if other folks have seen this issue recently. #9116 succeeded [shortly after ](https://gpuci.gpuopenanalytics.com/job/dask/job/dask/job/prb/job/dask-prb/2080/)the failing build

@AntSimi thank you for the bug report. I verified similar behavior but noticed that upon multiple runs it sometimes is correct. Not sure what's going on here. Can I ask...

@pentschev it looks like the API in RMM has changed ? > 11:50:01 > rmm_usage = c.run_on_scheduler(rmm.get_info) 11:50:01 E AttributeError: module 'rmm' has no attribute 'get_info' Are you aware of...

Apparently this was removed some time [ago last summer](https://github.com/rapidsai/rmm/issues/390) and was updated [in this PR](https://github.com/rapidsai/rmm/pull/626/) . @jrbourbeau please merge that PR when you feel it's ready and we'll fix that...

@jrbourbeau we addressed the `dask-cudf` question [here](https://github.com/rapidsai/dask-build-environment/pull/1/files#r681128007) -- no we do not need it right now

At the moment it looks like Dask chooses to do nothing if the column name matches the index name: https://github.com/dask/dask/blob/0b637ab775fa177d0af737e4648a89484aee5c74/dask/dataframe/core.py#L4813-L4817 @jorloplaz do you have any thoughts whether this optimization was...

While this issue is focused on writes I think it's still relevant to how dask handles multi-processes and hdf5 processing: https://github.com/dask/dask/issues/3074

@jakirkham do you have any thoughts about what could be going on here?