core
core copied to clipboard
[BUG] Dask-CUDA does not work with Merlin/NVTabular
As pointed out by @oliverholworthy in https://github.com/NVIDIA-Merlin/core/pull/274#discussion_r1160187901, cuda_isavailable()
is used in merlin.core.compat
to check for cuda support. Unfortunately, this is a known problem for dask-cuda.
This most likely means that Merlin/NVTabular has not worked properly with Dask-CUDA for more than six months now. For example, the following code will produce an OOM error for 32GB V100s:
import time
from merlin.core.utils import Distributed
if __name__ == "__main__":
with Distributed(rmm_pool_size="24GiB"):
time.sleep(30)
You will also see an error if you don't import any merlin/nvt code, but use the offending cuda.is_available()
command:
import time
from numba import cuda # This is fine
cuda.is_available() # This is NOT
from dask_cuda import LocalCUDACluster
if __name__ == "__main__":
with LocalCUDACluster(rmm_pool_size="24GiB") as cluster:
time.sleep(30)
Meanwhile, the code works fine if you don't sue the offending command or import code that also imports merlin.core.compat
:
import time
from dask_cuda import LocalCUDACluster
if __name__ == "__main__":
with LocalCUDACluster(rmm_pool_size="24GiB") as cluster:
time.sleep(30)
cc @jperez999 @karlhigley
@jperez999 - Do you think this line is actually necessary? If we already have HAS_GPU
from the line above, maybe we can just do:
if not HAS_GPU:
cuda = None
Besides the above I was looking at the code in more detail and I see the following block: https://github.com/NVIDIA-Merlin/core/blob/6e52b48140615708b59926b5f9c3601f8feeab93/merlin/core/compat/init.py#L102-L105
This is creating a new context on a GPU only to query memory size, and CUDA context should never be addressed before Dask initializes the cluster. Also note in the pynvml_mem_size
there's an equivalent code block: https://github.com/NVIDIA-Merlin/core/blob/6e52b48140615708b59926b5f9c3601f8feeab93/merlin/core/compat/init.py#L57-L60
The PyNVML code will NOT create CUDA context and is safe to run before Dask. Is there a reason why you're using the code block with Numba to query GPU memory instead of always using PyNVML for that?