Lawrence Mitchell comments

Results 226 comments of


                                            Lawrence Mitchell

Dask-Cuda running out of memory with Cupy Sobel

`from_array` in your example has to build the full 40GiB array on the client side, that is, as a single allocation, and then move it to the workers. Since your...

Dask-Cuda running out of memory with Cupy Sobel

> "Best practice is to generate the chunked array directly on the workers" can you give an example of how to do that? Use one of the [creation mechanisms](https://docs.dask.org/en/stable/array-creation.html#) that...

Dask-Cuda running out of memory with Cupy Sobel

OK, in this case, I recommend using [`segysak`](https://github.com/trhallam/segysak) to load the SEG-Y files and then interface with dask. They have an [example doing just this](https://segysak.readthedocs.io/en/latest/examples/example_segysak_dask.html) in their documentation, which should...

Dask-Cuda running out of memory with Cupy Sobel

I am still a little confused, `da.from_array` delivers an array on the workers, no? Why are you then doing `client.scatter` with it?

[BUG] LocalCUDACluster doesn't work with NVIDIA MIG

> That didn't work, but setting the environment variable `export DASK_DISTRIBUTED__DIAGNOSTICS__NVML=False` did. Thanks for pointing me in the right direction. This usually indicates a bug in the way dask config...

[BUG] LocalCUDACluster doesn't work with NVIDIA MIG

> I'll try and find some time to handle this properly in distributed. dask/distributed#6678

Performance regression in cuDF merge benchmark

Short-term fix disabling compression is in #957.

Get rid of setting `CUDA_VISIBLE_DEVICES` as an environment variable to `Nanny`

So, suppose that there are 4 GPUs in the system, but `CUDA_VISIBLE_DEVICES=1,3`. Then: ```python from cuda import cuda cuda.cuInit(0) cuda.cuDeviceCanAccessPeer(0, 1) # can device 1 talk to device 3? =>...

Get rid of setting `CUDA_VISIBLE_DEVICES` as an environment variable to `Nanny`

> The easiest way to test is actually to just test with UCX-Py between peers. IIUC: ``` # let both processes see both devices $ CUDA_VISIBLE_DEVICES=0,1 python send-recv-core.py --reuse-alloc -d...

Get rid of setting `CUDA_VISIBLE_DEVICES` as an environment variable to `Nanny`

If I do: ``` $ dask-scheduler --protocol ucx --scheduler-file foo.json & $ for i in $(seq 0 7); do CUDA_VISIBLE_DEVICES=$i dask-cuda-worker --scheduler-file foo.json --protocol ucx &; done $ python local_cudf_merge.py...