[FEA]: Support for finding `libcudadevrt.a` through pathfinder
Is this a duplicate?
- [x] I confirmed there appear to be no duplicate issues for this request and that I agree to the Code of Conduct
Area
cuda.pathfinder
Is your feature request related to a problem? Please describe.
numba-cuda needs to be able to find libcudadevrt.a to launch kernels involving cooperative groups, but needs to find it itself via get_cudalib. Ideally the pathfinder could provide this for us.
Describe the solution you'd like
Something like path_finder._get_nvidia_static_library("cudadevrt")
Describe alternatives you've considered
Numba's existing logic
Additional context
https://github.com/NVIDIA/numba-cuda/issues/302
numba-cudaneeds to be able to findlibcudadevrt.ato launch kernels involving cooperative groups
@brandon-b-miller could you confirm if this is true? AFAIK cooperative launch does not require libcudadevrt. The only CUDA feature that needs it is CDP1 and CDP2 (CUDA Dynamic Parallelism).
In the relevant codepath, there's this note in the numba-cuda source code:
# We need to link against cudadevrt if grid sync is being used.
https://github.com/NVIDIA/numba-cuda/blob/main/numba_cuda/numba/cuda/dispatcher.py#L157-L161
That's why I would like us to test it locally before determining the priority, because we have the same grid.sync() test in the CI and I am pretty sure libcudadevrt is not needed for CG, only for CDP1/2 (which I added to CuPy) 🙂
https://github.com/NVIDIA/cuda-python/blob/24fde1755ad1efa6e7d73b2236dad9681fa0b727/cuda_core/tests/test_launcher.py#L157-L199
Without adding libcudadevrt.a to the link, the cooperative groups tests fail for me locally:
python -m numba.runtests numba.cuda.tests.cudapy.test_cooperative_groups
pynvjitlink.api.NvJitLinkError: NVJITLINK_ERROR_INTERNAL error when calling nvJitLinkComplete
error : Undefined reference to 'cudaCGGetIntrinsicHandle' in '<cudapy-ptx>'
ERROR 9: finish
It seems this is needed when -rdc=true is passed. Does numba-cuda do that always? I forgot.
Hi @brandon-b-miller @gmarkall this is not urgent but it'd be nice to discuss this in the next sync-up. We recently also hit the error with undefined reference to cudaCGGetIntrinsicHandle in the CI. But it turns out that it is a known issue at least with NVRTC: https://github.com/NVIDIA/cuda-python/pull/720#discussion_r2173147444. Now, I understand Numba does not use NVRTC for CG: https://github.com/NVIDIA/numba-cuda/blob/main/numba_cuda/numba/cuda/cg.py, but perhaps the root cause is similar (a variant of version mismatching) and it'd be great to get to the bottom of it.
Cursor-generated research (with manual edits), please comment if there are issues with this information.
Location of libcudadevrt.a / cudadevrt.lib
Linux (libcudadevrt.a)
| Source | CUDA 12 | CUDA 13 |
|---|---|---|
System CTK (/usr/local/cuda) |
/usr/local/cuda/lib64/libcudadevrt.a |
Same |
| pip wheel | nvidia/cuda_runtime/lib/libcudadevrt.a |
nvidia/cu13/lib/libcudadevrt.a |
| Conda | $CONDA_PREFIX/lib/libcudadevrt.a |
Same |
Wheel packages:
- CUDA 12:
nvidia-cuda-runtime-cu12 - CUDA 13:
nvidia-cuda-runtime(version 13.x)
Windows (cudadevrt.lib)
| Source | CUDA 12 | CUDA 13 |
|---|---|---|
| System CTK | %CUDA_PATH%\lib\x64\cudadevrt.lib |
Same |
| pip wheel | nvidia/cuda_runtime/lib/x64/cudadevrt.lib |
nvidia/cu13/lib/x64/cudadevrt.lib |
| Conda | %CONDA_PREFIX%\Library\lib\x64\cudadevrt.lib |
Same |
Key Observations
-
CUDA 13 wheels use a new layout:
nvidia/cu13/lib/instead ofnvidia/cuda_runtime/lib/ -
Static library subdirectory differs from dynamic libs: For site-packages:
- Dynamic libs (
.so/.dll):nvidia/cuda_runtime/bin(Win) ornvidia/cuda_runtime/lib(Linux) - Static libs (
.a/.lib):nvidia/cuda_runtime/lib(Linux) ornvidia/cuda_runtime/lib/x64(Win)
- Dynamic libs (
-
For System CTK on Linux: Both
/usr/local/cuda/lib64/and/usr/local/cuda/targets/x86_64-linux/lib/containlibcudadevrt.a
Verified From
- System CTK:
/usr/local/cuda-12.9,/usr/local/cuda-13.0,/usr/local/cuda-13.1 - pip wheels (Linux x86_64):
-
nvidia_cuda_runtime_cu12-12.9.79-py3-none-manylinux2014_x86_64.whl -
nvidia_cuda_runtime-13.1.80-py3-none-manylinux2014_x86_64.whl
-
- pip wheels (Windows):
-
nvidia_cuda_runtime_cu12-12.9.79-py3-none-win_amd64.whl -
nvidia_cuda_runtime-13.1.80-py3-none-win_amd64.whl
-
Note: PR #1512 added a helper function under graph/test_device_launch.py to locate cudadevrt. Please update that if this issue is resolved.