cuda-python icon indicating copy to clipboard operation
cuda-python copied to clipboard

[FEA]: Support for finding `libcudadevrt.a` through pathfinder

Open brandon-b-miller opened this issue 8 months ago • 7 comments

Is this a duplicate?

  • [x] I confirmed there appear to be no duplicate issues for this request and that I agree to the Code of Conduct

Area

cuda.pathfinder

Is your feature request related to a problem? Please describe.

numba-cuda needs to be able to find libcudadevrt.a to launch kernels involving cooperative groups, but needs to find it itself via get_cudalib. Ideally the pathfinder could provide this for us.

Describe the solution you'd like

Something like path_finder._get_nvidia_static_library("cudadevrt")

Describe alternatives you've considered

Numba's existing logic

Additional context

https://github.com/NVIDIA/numba-cuda/issues/302

brandon-b-miller avatar Jun 18 '25 13:06 brandon-b-miller

numba-cuda needs to be able to find libcudadevrt.a to launch kernels involving cooperative groups

@brandon-b-miller could you confirm if this is true? AFAIK cooperative launch does not require libcudadevrt. The only CUDA feature that needs it is CDP1 and CDP2 (CUDA Dynamic Parallelism).

leofang avatar Jun 18 '25 17:06 leofang

In the relevant codepath, there's this note in the numba-cuda source code:

# We need to link against cudadevrt if grid sync is being used.

https://github.com/NVIDIA/numba-cuda/blob/main/numba_cuda/numba/cuda/dispatcher.py#L157-L161

brandon-b-miller avatar Jun 18 '25 17:06 brandon-b-miller

That's why I would like us to test it locally before determining the priority, because we have the same grid.sync() test in the CI and I am pretty sure libcudadevrt is not needed for CG, only for CDP1/2 (which I added to CuPy) 🙂 https://github.com/NVIDIA/cuda-python/blob/24fde1755ad1efa6e7d73b2236dad9681fa0b727/cuda_core/tests/test_launcher.py#L157-L199

leofang avatar Jun 18 '25 18:06 leofang

Without adding libcudadevrt.a to the link, the cooperative groups tests fail for me locally:

python -m numba.runtests numba.cuda.tests.cudapy.test_cooperative_groups
pynvjitlink.api.NvJitLinkError: NVJITLINK_ERROR_INTERNAL error when calling nvJitLinkComplete
error   : Undefined reference to 'cudaCGGetIntrinsicHandle' in '<cudapy-ptx>'
ERROR 9: finish

brandon-b-miller avatar Jun 18 '25 18:06 brandon-b-miller

It seems this is needed when -rdc=true is passed. Does numba-cuda do that always? I forgot.

leofang avatar Jun 18 '25 21:06 leofang

Hi @brandon-b-miller @gmarkall this is not urgent but it'd be nice to discuss this in the next sync-up. We recently also hit the error with undefined reference to cudaCGGetIntrinsicHandle in the CI. But it turns out that it is a known issue at least with NVRTC: https://github.com/NVIDIA/cuda-python/pull/720#discussion_r2173147444. Now, I understand Numba does not use NVRTC for CG: https://github.com/NVIDIA/numba-cuda/blob/main/numba_cuda/numba/cuda/cg.py, but perhaps the root cause is similar (a variant of version mismatching) and it'd be great to get to the bottom of it.

leofang avatar Jun 28 '25 06:06 leofang

Cursor-generated research (with manual edits), please comment if there are issues with this information.


Location of libcudadevrt.a / cudadevrt.lib

Linux (libcudadevrt.a)

Source CUDA 12 CUDA 13
System CTK (/usr/local/cuda) /usr/local/cuda/lib64/libcudadevrt.a Same
pip wheel nvidia/cuda_runtime/lib/libcudadevrt.a nvidia/cu13/lib/libcudadevrt.a
Conda $CONDA_PREFIX/lib/libcudadevrt.a Same

Wheel packages:

  • CUDA 12: nvidia-cuda-runtime-cu12
  • CUDA 13: nvidia-cuda-runtime (version 13.x)

Windows (cudadevrt.lib)

Source CUDA 12 CUDA 13
System CTK %CUDA_PATH%\lib\x64\cudadevrt.lib Same
pip wheel nvidia/cuda_runtime/lib/x64/cudadevrt.lib nvidia/cu13/lib/x64/cudadevrt.lib
Conda %CONDA_PREFIX%\Library\lib\x64\cudadevrt.lib Same

Key Observations

  1. CUDA 13 wheels use a new layout: nvidia/cu13/lib/ instead of nvidia/cuda_runtime/lib/

  2. Static library subdirectory differs from dynamic libs: For site-packages:

    • Dynamic libs (.so/.dll): nvidia/cuda_runtime/bin (Win) or nvidia/cuda_runtime/lib (Linux)
    • Static libs (.a/.lib): nvidia/cuda_runtime/lib (Linux) or nvidia/cuda_runtime/lib/x64 (Win)
  3. For System CTK on Linux: Both /usr/local/cuda/lib64/ and /usr/local/cuda/targets/x86_64-linux/lib/ contain libcudadevrt.a

Verified From

  • System CTK: /usr/local/cuda-12.9, /usr/local/cuda-13.0, /usr/local/cuda-13.1
  • pip wheels (Linux x86_64):
    • nvidia_cuda_runtime_cu12-12.9.79-py3-none-manylinux2014_x86_64.whl
    • nvidia_cuda_runtime-13.1.80-py3-none-manylinux2014_x86_64.whl
  • pip wheels (Windows):
    • nvidia_cuda_runtime_cu12-12.9.79-py3-none-win_amd64.whl
    • nvidia_cuda_runtime-13.1.80-py3-none-win_amd64.whl

rwgk avatar Jan 17 '26 01:01 rwgk

Note: PR #1512 added a helper function under graph/test_device_launch.py to locate cudadevrt. Please update that if this issue is resolved.

Andy-Jost avatar Jan 22 '26 22:01 Andy-Jost