numba
numba copied to clipboard
CUDA: Support CUDA Toolkit conda packages from NVIDIA
NVIDIA now publishes conda packages containing the CUDA toolkit: https://anaconda.org/nvidia
These packages place components in different locations to the Anaconda- and conda-forge-maintained packages. This PR updates Numba's library search logic so that it can find the libraries in these packages. The locations of the components in these packages are (all relative to $CONDA_PREFIX
):
- NVVM is placed in
nvvm/lib64
on Linux andnvvm/bin
on Windows - Static CUDA libraries are in
lib
on Linux, andLib/x64
on Windows - Dynamic CUDA libraries are in
lib
on Linux, andbin
on Windows - Libdevice is in
nvvm/libdevice
I also noticed that locating libcudadevrt was broken when using a non-conda-installed toolkit on Windows - this is also resolved by this PR.
Along with this, I thought it would also be helpful to display the locations searched for libcuda.so (as the location of libcuda.so can be an issue, as in #7104) - this PR shows the search locations and the path used for a successful load, but unfortunately this is sometimes relative, with the actual path determined by the system loader. There is no easy way to check the absolute path of the loaded library, but something more complex to report the exact path could be added in a future PR. A little refactoring was needed in cudadrv.py
to separate the path determination from the actual loading, so that the path info could be presented by libs.test()
.
The logic additions in cuda_paths.py
are a little unwieldy and contrived - unfortunately as the file has evolved things have got a little out of control - it's difficult to get this logic both correct and minimal, so I've left it as it is for now rather than trying to further refactor the code here - given that this code does not change much, I'm inclined not to spend too much more time thinking about it.
@esc Could this have a buildfarm run prior to review please? I'm concerned that something might be up (because this has been hard to get right) and I'd rather make sure it works before the review, rather than getting it approved then discovering a fatal flaw in the logic.
Moving from the 0.57 milestone pending future developments in the structure and distribution of the CUDA toolkit packages - once a clear route forward is visible this PR can be updated and moved into an appropriate milestone.
@stuartarchibald Many thanks for the review. The way forward for publishing CUDA toolkit packages on Anaconda.org is now resolved, and CUDA 12.0 packages are available on the NVIDIA channel. I have updated this PR with main
and in response to the comments above, so it should be ready for another round of review.
A couple of points on testing - first, it would be good to test with gpuCI, but the driver version in our gpuCI setup doesn't have CUDA 12.0 yet, so that will have to be added in future. Secondly, I've tested this locally with the NVIDIA CUDA 12.0 packages, and tests pass, and the library tests show correct detection:
$ python -c "from numba import cuda; cuda.cudadrv.libs.test()"
Finding driver from candidates: libcuda.so, libcuda.so.1, /usr/lib/libcuda.so, /usr/lib/libcuda.so.1, /usr/lib64/libcuda.so, /usr/lib64/libcuda.so.1...
Using loader <class 'ctypes.CDLL'>
trying to load driver... ok, loaded from libcuda.so
Finding nvvm from Conda environment (NVIDIA package)
located at /home/gmarkall/mambaforge/envs/numba-nvidia-channel/nvvm/lib64/libnvvm.so.4.0.0
trying to open library... ok
Finding cudart from Conda environment (NVIDIA package)
located at /home/gmarkall/mambaforge/envs/numba-nvidia-channel/lib/libcudart.so.12.0.146
trying to open library... ok
Finding cudadevrt from Conda environment (NVIDIA package)
located at /home/gmarkall/mambaforge/envs/numba-nvidia-channel/lib/libcudadevrt.a
Finding libdevice from Conda environment (NVIDIA package)
trying to open library... ok
If you want to test locally with the NVIDIA packages, you can install it with:
conda install nvidia::cuda-toolkit=12
Let me know if you run into any issues in testing.
gpuci run tests
Thanks for the updates in 09d6fc1, they look good. I think this patch just needs a manual test and a run through the buildfarm.
gpuci run tests
@stuartarchibald As discussed OOB I've added the supported CCs to nvvm.py
for toolkits 12.0 and 12.1 - these are changes related to this PR since the packages it adds support for are from 12.0 onwards. I also made a note in the docs that MVC is not supported on CUDA 12 (when I made the MVC PR CUDA 12 was not out yet so it stated nothing about it at the time).
@stuartarchibald As discussed OOB I've added the supported CCs to
nvvm.py
for toolkits 12.0 and 12.1 - these are changes related to this PR since the packages it adds support for are from 12.0 onwards. I also made a note in the docs that MVC is not supported on CUDA 12 (when I made the MVC PR CUDA 12 was not out yet so it stated nothing about it at the time).
Many thanks @gmarkall. I've tested this PR at bd92dd2 manually using CUDA Toolkit 12.1 conda packages from NVIDIA, all the CUDA unit tests pass and numba -s
correctly reports the use of NVIDIA packages.
gpuci run tests
@stuartarchibald Many thanks, line-wrap change committed.
Buildfarm ID: numba_smoketest_cuda_yaml_185
.
Buildfarm ID:
numba_smoketest_cuda_yaml_185
.
Passed.