numba icon indicating copy to clipboard operation
numba copied to clipboard

CUDA: Support CUDA Toolkit conda packages from NVIDIA

Open gmarkall opened this issue 3 years ago • 2 comments

NVIDIA now publishes conda packages containing the CUDA toolkit: https://anaconda.org/nvidia

These packages place components in different locations to the Anaconda- and conda-forge-maintained packages. This PR updates Numba's library search logic so that it can find the libraries in these packages. The locations of the components in these packages are (all relative to $CONDA_PREFIX):

  • NVVM is placed in nvvm/lib64 on Linux and nvvm/bin on Windows
  • Static CUDA libraries are in lib on Linux, and Lib/x64 on Windows
  • Dynamic CUDA libraries are in lib on Linux, and bin on Windows
  • Libdevice is in nvvm/libdevice

I also noticed that locating libcudadevrt was broken when using a non-conda-installed toolkit on Windows - this is also resolved by this PR.

Along with this, I thought it would also be helpful to display the locations searched for libcuda.so (as the location of libcuda.so can be an issue, as in #7104) - this PR shows the search locations and the path used for a successful load, but unfortunately this is sometimes relative, with the actual path determined by the system loader. There is no easy way to check the absolute path of the loaded library, but something more complex to report the exact path could be added in a future PR. A little refactoring was needed in cudadrv.py to separate the path determination from the actual loading, so that the path info could be presented by libs.test().

The logic additions in cuda_paths.py are a little unwieldy and contrived - unfortunately as the file has evolved things have got a little out of control - it's difficult to get this logic both correct and minimal, so I've left it as it is for now rather than trying to further refactor the code here - given that this code does not change much, I'm inclined not to spend too much more time thinking about it.

gmarkall avatar Jul 27 '21 15:07 gmarkall

@esc Could this have a buildfarm run prior to review please? I'm concerned that something might be up (because this has been hard to get right) and I'd rather make sure it works before the review, rather than getting it approved then discovering a fatal flaw in the logic.

gmarkall avatar Jul 27 '21 15:07 gmarkall

Moving from the 0.57 milestone pending future developments in the structure and distribution of the CUDA toolkit packages - once a clear route forward is visible this PR can be updated and moved into an appropriate milestone.

gmarkall avatar Jul 19 '22 14:07 gmarkall

@stuartarchibald Many thanks for the review. The way forward for publishing CUDA toolkit packages on Anaconda.org is now resolved, and CUDA 12.0 packages are available on the NVIDIA channel. I have updated this PR with main and in response to the comments above, so it should be ready for another round of review.

A couple of points on testing - first, it would be good to test with gpuCI, but the driver version in our gpuCI setup doesn't have CUDA 12.0 yet, so that will have to be added in future. Secondly, I've tested this locally with the NVIDIA CUDA 12.0 packages, and tests pass, and the library tests show correct detection:

$ python -c "from numba import cuda; cuda.cudadrv.libs.test()"
Finding driver from candidates: libcuda.so, libcuda.so.1, /usr/lib/libcuda.so, /usr/lib/libcuda.so.1, /usr/lib64/libcuda.so, /usr/lib64/libcuda.so.1...
Using loader <class 'ctypes.CDLL'>
	trying to load driver...	ok, loaded from libcuda.so
Finding nvvm from Conda environment (NVIDIA package)
	located at /home/gmarkall/mambaforge/envs/numba-nvidia-channel/nvvm/lib64/libnvvm.so.4.0.0
	trying to open library...	ok
Finding cudart from Conda environment (NVIDIA package)
	located at /home/gmarkall/mambaforge/envs/numba-nvidia-channel/lib/libcudart.so.12.0.146
	trying to open library...	ok
Finding cudadevrt from Conda environment (NVIDIA package)
	located at /home/gmarkall/mambaforge/envs/numba-nvidia-channel/lib/libcudadevrt.a
Finding libdevice from Conda environment (NVIDIA package)
	trying to open library...	ok

If you want to test locally with the NVIDIA packages, you can install it with:

conda install nvidia::cuda-toolkit=12

Let me know if you run into any issues in testing.

gmarkall avatar Feb 14 '23 10:02 gmarkall

gpuci run tests

gmarkall avatar Feb 14 '23 10:02 gmarkall

Thanks for the updates in 09d6fc1, they look good. I think this patch just needs a manual test and a run through the buildfarm.

stuartarchibald avatar Feb 24 '23 16:02 stuartarchibald

gpuci run tests

gmarkall avatar Mar 08 '23 11:03 gmarkall

@stuartarchibald As discussed OOB I've added the supported CCs to nvvm.py for toolkits 12.0 and 12.1 - these are changes related to this PR since the packages it adds support for are from 12.0 onwards. I also made a note in the docs that MVC is not supported on CUDA 12 (when I made the MVC PR CUDA 12 was not out yet so it stated nothing about it at the time).

gmarkall avatar Mar 08 '23 11:03 gmarkall

@stuartarchibald As discussed OOB I've added the supported CCs to nvvm.py for toolkits 12.0 and 12.1 - these are changes related to this PR since the packages it adds support for are from 12.0 onwards. I also made a note in the docs that MVC is not supported on CUDA 12 (when I made the MVC PR CUDA 12 was not out yet so it stated nothing about it at the time).

Many thanks @gmarkall. I've tested this PR at bd92dd2 manually using CUDA Toolkit 12.1 conda packages from NVIDIA, all the CUDA unit tests pass and numba -s correctly reports the use of NVIDIA packages.

stuartarchibald avatar Mar 08 '23 11:03 stuartarchibald

gpuci run tests

gmarkall avatar Mar 08 '23 11:03 gmarkall

@stuartarchibald Many thanks, line-wrap change committed.

gmarkall avatar Mar 08 '23 11:03 gmarkall

Buildfarm ID: numba_smoketest_cuda_yaml_185.

stuartarchibald avatar Mar 08 '23 11:03 stuartarchibald

Buildfarm ID: numba_smoketest_cuda_yaml_185.

Passed.

stuartarchibald avatar Mar 08 '23 12:03 stuartarchibald