Peter Andreas Entschev
Peter Andreas Entschev
> When reading through the documentation of spilling, https://docs.rapids.ai/api/dask-cuda/stable/spilling.html, it is unclear how the CPU-GPU communication is done if UCX is actively chosen as the communication protocol. Will workers write...
In the case above you're only using `cudaMemcpy`, which will work either way, even if it has to go through host via PCIe. The original problem was with CUDA IPC...
Yes, that looks correct. Were you able to confirm the same on Dask-CUDA with multiple workers?
It looks fine, considering the known instability of results between iterations. Just to have one last metric, could you also run both by specifying `UCX_TLS=^cuda_ipc`?
Yup, agreed this looks right. It seems we indeed don't need `CUDA_VISIBLE_DEVICES` to list all GPUs, only the first one should suffice. Therefore, the idea I have in mind is...
> I think this is easy because: > > ```python > import numba.cuda > > uuid2devnum = dict((dev.uuid, dev.id) for dev in numba.cuda.list_devices()) > ``` Sorry, I missed this, but...
> Once we decide on a replacement, it would be good to solicit feedback from other users to make sure it still satisfies their use cases Do you have a...
> Hi, if this feature is planned for future release, would be great if there's a method abstracted by dask for downstream projects to obtain the GPU ordinal. Currently, XGBoost...
I'm not sure if you're trying to do something much more specific, but one of the features Dask-CUDA provides is setting CPU affinity (NUMA-ness) for each GPU, see https://github.com/rapidsai/dask-cuda/blob/09196cb5c92effca6da660231e58a5bf4ac72c76/dask_cuda/cuda_worker.py#L215 https://github.com/rapidsai/dask-cuda/blob/09196cb5c92effca6da660231e58a5bf4ac72c76/dask_cuda/local_cuda_cluster.py#L311...
Thanks for the details @lmeyerov . As you noticed, the `--local-directory` is something you could set to have spilling on a high bandwidth FS. However, the remaining of your question...