Is this a duplicate?

[x] I confirmed there appear to be no duplicate issues for this request and that I agree to the Code of Conduct

Area

cuda.pathfinder

Is your feature request related to a problem? Please describe.

numba-cuda needs to be able to find libcudadevrt.a to launch kernels involving cooperative groups, but needs to find it itself via get_cudalib. Ideally the pathfinder could provide this for us.

Describe the solution you'd like

Something like path_finder._get_nvidia_static_library("cudadevrt")

Describe alternatives you've considered

Numba's existing logic

Additional context

https://github.com/NVIDIA/numba-cuda/issues/302

Jun 18 '25 13:06 brandon-b-miller

numba-cuda needs to be able to find libcudadevrt.a to launch kernels involving cooperative groups

@brandon-b-miller could you confirm if this is true? AFAIK cooperative launch does not require libcudadevrt. The only CUDA feature that needs it is CDP1 and CDP2 (CUDA Dynamic Parallelism).

Jun 18 '25 17:06 leofang

In the relevant codepath, there's this note in the numba-cuda source code:

# We need to link against cudadevrt if grid sync is being used.

https://github.com/NVIDIA/numba-cuda/blob/main/numba_cuda/numba/cuda/dispatcher.py#L157-L161

Jun 18 '25 17:06 brandon-b-miller

That's why I would like us to test it locally before determining the priority, because we have the same grid.sync() test in the CI and I am pretty sure libcudadevrt is not needed for CG, only for CDP1/2 (which I added to CuPy) 🙂 https://github.com/NVIDIA/cuda-python/blob/24fde1755ad1efa6e7d73b2236dad9681fa0b727/cuda_core/tests/test_launcher.py#L157-L199

Jun 18 '25 18:06 leofang

Without adding libcudadevrt.a to the link, the cooperative groups tests fail for me locally:

python -m numba.runtests numba.cuda.tests.cudapy.test_cooperative_groups

pynvjitlink.api.NvJitLinkError: NVJITLINK_ERROR_INTERNAL error when calling nvJitLinkComplete
error   : Undefined reference to 'cudaCGGetIntrinsicHandle' in '<cudapy-ptx>'
ERROR 9: finish

Jun 18 '25 18:06 brandon-b-miller

It seems this is needed when -rdc=true is passed. Does numba-cuda do that always? I forgot.

Jun 18 '25 21:06 leofang

Hi @brandon-b-miller @gmarkall this is not urgent but it'd be nice to discuss this in the next sync-up. We recently also hit the error with undefined reference to cudaCGGetIntrinsicHandle in the CI. But it turns out that it is a known issue at least with NVRTC: https://github.com/NVIDIA/cuda-python/pull/720#discussion_r2173147444. Now, I understand Numba does not use NVRTC for CG: https://github.com/NVIDIA/numba-cuda/blob/main/numba_cuda/numba/cuda/cg.py, but perhaps the root cause is similar (a variant of version mismatching) and it'd be great to get to the bottom of it.

Jun 28 '25 06:06 leofang

Cursor-generated research (with manual edits), please comment if there are issues with this information.

Location of `libcudadevrt.a` / `cudadevrt.lib`

Linux (`libcudadevrt.a`)

Source	CUDA 12	CUDA 13
System CTK (`/usr/local/cuda`)	`/usr/local/cuda/lib64/libcudadevrt.a`	Same
pip wheel	`nvidia/cuda_runtime/lib/libcudadevrt.a`	`nvidia/cu13/lib/libcudadevrt.a`
Conda	`$CONDA_PREFIX/lib/libcudadevrt.a`	Same

Wheel packages:

CUDA 12: nvidia-cuda-runtime-cu12
CUDA 13: nvidia-cuda-runtime (version 13.x)

Windows (`cudadevrt.lib`)

Source	CUDA 12	CUDA 13
System CTK	`%CUDA_PATH%\lib\x64\cudadevrt.lib`	Same
pip wheel	`nvidia/cuda_runtime/lib/x64/cudadevrt.lib`	`nvidia/cu13/lib/x64/cudadevrt.lib`
Conda	`%CONDA_PREFIX%\Library\lib\x64\cudadevrt.lib`	Same

Key Observations

CUDA 13 wheels use a new layout: nvidia/cu13/lib/ instead of nvidia/cuda_runtime/lib/
Static library subdirectory differs from dynamic libs: For site-packages:
- Dynamic libs (.so/.dll): nvidia/cuda_runtime/bin (Win) or nvidia/cuda_runtime/lib (Linux)
- Static libs (.a/.lib): nvidia/cuda_runtime/lib (Linux) or nvidia/cuda_runtime/lib/x64 (Win)
For System CTK on Linux: Both /usr/local/cuda/lib64/ and /usr/local/cuda/targets/x86_64-linux/lib/ contain libcudadevrt.a

Verified From

System CTK: /usr/local/cuda-12.9, /usr/local/cuda-13.0, /usr/local/cuda-13.1
pip wheels (Linux x86_64):
- nvidia_cuda_runtime_cu12-12.9.79-py3-none-manylinux2014_x86_64.whl
- nvidia_cuda_runtime-13.1.80-py3-none-manylinux2014_x86_64.whl
pip wheels (Windows):
- nvidia_cuda_runtime_cu12-12.9.79-py3-none-win_amd64.whl
- nvidia_cuda_runtime-13.1.80-py3-none-win_amd64.whl

Jan 17 '26 01:01 rwgk

Note: PR #1512 added a helper function under graph/test_device_launch.py to locate cudadevrt. Please update that if this issue is resolved.

Jan 22 '26 22:01 Andy-Jost

[FEA]: Support for finding `libcudadevrt.a` through pathfinder

Is this a duplicate?

Area

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Location of libcudadevrt.a / cudadevrt.lib

Linux (libcudadevrt.a)

Windows (cudadevrt.lib)

Key Observations

Verified From

Location of `libcudadevrt.a` / `cudadevrt.lib`

Linux (`libcudadevrt.a`)

Windows (`cudadevrt.lib`)