cuda-python icon indicating copy to clipboard operation
cuda-python copied to clipboard

[FEA]: Support finding `nvvm` from the system

Open brandon-b-miller opened this issue 4 months ago • 9 comments

Is this a duplicate?

  • [x] I confirmed there appear to be no duplicate issues for this request and that I agree to the Code of Conduct

Area

cuda.pathfinder

Is your feature request related to a problem? Please describe.

numba-cuda currently supports functioning with only system packages and no nvvm/nvrtc wheels. Today, I can't find nvvm from the system using cuda-pathfinder:

root@machine:/# ls /usr/local/cuda/nvvm/lib64/libnvvm.so
/usr/local/cuda/nvvm/lib64/libnvvm.so
>>> from cuda.pathfinder import load_nvidia_dynamic_lib
>>> load_nvidia_dynamic_lib('nvvm')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/pyenv/versions/3.12.11/lib/python3.12/site-packages/cuda/pathfinder/_dynamic_libs/load_nvidia_dynamic_lib.py", line 140, in load_nvidia_dynamic_lib
    return _load_lib_no_cache(libname)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/pyenv/versions/3.12.11/lib/python3.12/site-packages/cuda/pathfinder/_dynamic_libs/load_nvidia_dynamic_lib.py", line 57, in _load_lib_no_cache
    finder.raise_not_found_error()
  File "/pyenv/versions/3.12.11/lib/python3.12/site-packages/cuda/pathfinder/_dynamic_libs/find_nvidia_dynamic_lib.py", line 210, in raise_not_found_error
    raise DynamicLibNotFoundError(f'Failure finding "{self.lib_searched_for}": {err}\n{att}')
cuda.pathfinder._dynamic_libs.load_dl_common.DynamicLibNotFoundError: Failure finding "libnvvm.so": No such file: libnvvm.so*, No such file: libnvvm.so*, No such file: libnvvm.so*, No such file: libnvvm.so*

Describe the solution you'd like

I'd like load_nvidia_dynamic_lib to be able to load the library at the above path.

Describe alternatives you've considered

numba-cuda currently implements this handling so we could theoretically maintain it, but then it somewhat defeats the purpose of adopting pathfinder.

Additional context

No response

brandon-b-miller avatar Oct 20 '25 16:10 brandon-b-miller

With which version of pathfinder does this happen? I thought we've made a special case for nvvm precisely because it does not show up in the system search path. We also covered this case in the CI ($CUDA_PATH/nvvm/lib64/ is NOT added to $LD_LIBRARY_PATH).

leofang avatar Oct 21 '25 13:10 leofang

The error is present for 1.3.1. The issue may be reproduced inside

rapidsai/citestwheel:cuda12.9.1-ubuntu24.04-py3.12

with

pip install cuda-pathfinder==1.3.1

Then

root@eb59577e416b:/# find . -name "libnvvm.so"
./usr/local/cuda-12.9/nvvm/lib64/libnvvm.so
root@eb59577e416b:/# python
Python 3.12.11 (main, Sep 22 2025, 15:22:20) [GCC 13.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from cuda.pathfinder import load_nvidia_dynamic_lib
>>> load_nvidia_dynamic_lib('nvvm')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/pyenv/versions/3.12.11/lib/python3.12/site-packages/cuda/pathfinder/_dynamic_libs/load_nvidia_dynamic_lib.py", line 140, in load_nvidia_dynamic_lib
    return _load_lib_no_cache(libname)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/pyenv/versions/3.12.11/lib/python3.12/site-packages/cuda/pathfinder/_dynamic_libs/load_nvidia_dynamic_lib.py", line 57, in _load_lib_no_cache
    finder.raise_not_found_error()
  File "/pyenv/versions/3.12.11/lib/python3.12/site-packages/cuda/pathfinder/_dynamic_libs/find_nvidia_dynamic_lib.py", line 210, in raise_not_found_error
    raise DynamicLibNotFoundError(f'Failure finding "{self.lib_searched_for}": {err}\n{att}')
cuda.pathfinder._dynamic_libs.load_dl_common.DynamicLibNotFoundError: Failure finding "libnvvm.so": No such file: libnvvm.so*, No such file: libnvvm.so*, No such file: libnvvm.so*, No such file: libnvvm.so*

brandon-b-miller avatar Oct 21 '25 14:10 brandon-b-miller

I thought we've made a special case for nvvm precisely because it does not show up in the system search path.

In the meantime (between #441 and #1038) it became certain to me that we need to tie the discovery of nvvm in some way to the discovery another library, at the point where we reach the system-search stage, but nvvm is not found there. — That's actually what led me to propose the mechanism under #1038, although that mechanism is powerful beyond just finding nvvm in a consistent way.

As soon as I'm done with the CTK-next work, I'll jump on this. I'll probably solve this issue with a "scoped search lite" approach.

rwgk avatar Oct 21 '25 15:10 rwgk

This seems like another case where libraries are in /usr/local/cuda* but are not necessarily in the ldconfig and that the expectation is we still search and find them within there regardless

kkraus14 avatar Oct 27 '25 15:10 kkraus14

@kkraus14 wrote:

This seems like another case where libraries are in /usr/local/cuda* but are not necessarily in the ldconfig and that the expectation is we still search and find them within there regardless

Yes. I was thinking libvvm is the only such case, by design.

It's a long-established numba-cuda feature that libnvvm is found, unless we are OK with a regression from a user perspective, we have to support it.

Tangential, but maybe useful for me to know long-term:

I'm only aware of one other such case, but we're considering that a bug:

https://github.com/NVIDIA/cuda-python/blob/53f5680dba2996cd4be93ed7ba701993015e058a/cuda_pathfinder/cuda/pathfinder/_dynamic_libs/load_dl_linux.py#L179-L195

Are there any other cases?

rwgk avatar Oct 27 '25 16:10 rwgk

I'm only aware of one other such case, but we're considering that a bug:

I just looked around, I think the bug that prompted us to add the _work_around_known_bugs() was fixed already.

I ran into a similar problem elsewhere (internal) and wrongly assumed it's still the same bug.

I still need to understand why exactly I was seeing that similar problem.

rwgk avatar Oct 27 '25 16:10 rwgk

This seems like another case where libraries are in /usr/local/cuda* but are not necessarily in the ldconfig and that the expectation is we still search and find them within there regardless

Yes. I was thinking libvvm is the only such case, by design.

Yes. But, this question still needs to be answered:

We also covered this case in the CI ($CUDA_PATH/nvvm/lib64/ is NOT added to $LD_LIBRARY_PATH).

Why didn't we run into any issue so far, in our local-ctk CI pipelines? My impression is that we already addressed this. Would be nice to understand it before implementing any fix. Maybe there is nothing to fix?

leofang avatar Nov 04 '25 03:11 leofang

Why didn't we run into any issue so far, in our local-ctk CI pipelines?

They all set CUDA_HOME, libnvvm is found that way.

rwgk avatar Nov 04 '25 06:11 rwgk

They all set CUDA_HOME, libnvvm is found that way.

I see, this is what I missed, thanks!

leofang avatar Nov 17 '25 02:11 leofang