[BUG] "import cudf" has changed the device ID
Looks like "import cudf" will change the device ID on 22.06 release
rapids 22.06
conda create -n rapids-22.06 -c rapidsai -c nvidia -c conda-forge \
rapids=22.06 python=3.9 cudatoolkit=11.5
repro
In [1]: import cupy
In [2]: cupy.cuda.runtime.getDevice()
Out[2]: 0
In [3]: cupy.cuda.runtime.setDevice(1)
In [4]: cupy.cuda.runtime.getDevice()
Out[4]: 1
In [5]: import cudf
In [6]: cupy.cuda.runtime.getDevice()
Out[6]: 0
Hi @wbo4958 - thanks for reporting! This looks like an issue with CUDA Python:
In [7]: import cuda.cudart
In [8]: import cupy
In [9]: cupy.cuda.runtime.setDevice(1)
In [10]: cupy.cuda.runtime.getDevice()
Out[10]: 1
In [11]: cuda.cudart.cudaGetDeviceCount()
Out[11]: (<cudaError_t.cudaSuccess: 0>, 2)
In [12]: cupy.cuda.runtime.getDevice()
Out[12]: 0
While we investigate further, perhaps you could use the environment variable CUDA_VISIBLE_DEVICES instead to control which GPU to use?
One way to work around this bug is to import cudf before cupy.
If for some reason you cannot do that, yet another workaround involves calling cuda.cudart.cudaGetDevice() before importing cudf:
In [1]: import cupy
In [2]: import cuda.cudart
In [3]: cuda.cudart.cudaGetDevice()
Out[3]: (<cudaError_t.cudaSuccess: 0>, 0)
In [4]: cupy.cuda.runtime.getDevice()
Out[4]: 0
In [5]: cupy.cuda.runtime.setDevice(1)
In [6]: cupy.cuda.runtime.getDevice()
Out[6]: 1
In [7]: import cudf
In [8]: cupy.cuda.runtime.getDevice()
Out[8]: 1
Thx @shwina
This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.
@shwina is this something we need to follow up with the cuda-python team on?
This is fixed by the changes in cuda-python that fixed nvidia/cuda-python#24, which are in v11.8. So should be resolved once we move there.