Benjamin Zaitlen issues

Results 27 issues of


Benjamin Zaitlen

[FEA] Support New Median / Median-Approximate in Dask-cuDF

In https://github.com/dask/dask/pull/9483 Dask now has implementation of `median` and `median_approximate`. These should available with dask_cudf. cuDF currently raise a NotImplementedError with `mean(axis=1)`: ```python cdf = cudf.datasets.timeseries() cdf = dd.from_pandas(cdf, npartitions=2)...

feature request

? - Needs Triage

flaky `test_drop_with_waiter`

Seen in #7089 https://github.com/dask/distributed/actions/runs/3154045824/jobs/5132914151 ``` ____________________________ test_drop_with_waiter _____________________________ args = (), kwds = {} @wraps(func) def inner(*args, **kwds): with self._recreate_cm(): > return func(*args, **kwds) ../../../miniconda3/envs/dask-distributed/lib/python3.10/contextlib.py:79: _ _ _ _ _...

flaky test

CA_CERT support

Can one use `aioes` with `ca_certs` passed in as a parameter like elasticsearch-py: ```python es = Elasticsearch( ['localhost', 'otherhost'], http_auth=('user', 'secret'), port=443, use_ssl=True, ca_certs='/path/to/cacert.pem', client_cert='/path/to/client_cert.pem', client_key='/path/to/client_key.pem', ) ```

Add mybinder

This PR adds a `mybinder` badge to the README.md for easy notebook launch and an `environment.yml`. The `environment.yml` defines the Python dependencies necessary to execute the notebook

Failing Out-of-Core Merge

I have a somewhat representative (and currently failing) example of merging two dataframes in a resources constrained environment: df_base = 295GB and 10674 partitions df_other = 466GB and 2576 partitions...

Memory Fragmentation Meta Issue

Memory fragmentation is problem with RMM which can lead to OOM errors. https://github.com/rapidsai/dask-cuda/pull/984 made fragmentation errors more visible. There is an experimental Virtual Memory Manager in dask-cuda (https://github.com/rapidsai/dask-cuda/pull/998) to help...

Configuration Profiles

We've had several issues come in related to defaults/cluster configurations (https://github.com/rapidsai/dask-cuda/issues/990 / https://github.com/rapidsai/dask-cuda/issues/348 / ...) and a general request to support "Configuration Profiles". As there are many ways to configure...

Benchmark Spilling

We should build a benchmark to profile spilling in at least the following scenarios: - performance: measure that our spilling is efficient in a larger workflow -- maybe this is...

inactive-30d

nvComp compression with Pack/Unpack

Following the addition of the pack/unpack work where cuDF tables can be stored as a single buffer, we'd like to explore compression of that buffer in hopes of better memory...

inactive-90d

inactive-30d

Test Pack/Unpack w/ cuDF Merge

@charlesbluca [recently added](https://github.com/rapidsai/cudf/pull/8153) a new serialization method for cuDF Dataframes where data is now serialized into two buffers: one for metadata and another for the data itself. As opposed to...

inactive-90d

inactive-30d