cuda-python
cuda-python copied to clipboard
First version of `cuda.bindings.path_finder`
The third iteration of this PR description is:
Closes #453 (expected, but currently not tested)
This PR has these main aspects:
- All dynamic library loading (except libcuda) is moved from these cython files
cuda/bindings/_internal/nvjitlink_linux.pyx
cuda/bindings/_internal/nvjitlink_windows.pyx
cuda/bindings/_internal/nvvm_linux.pyx
cuda/bindings/_internal/nvvm_windows.pyx
to pure Python code under the new cuda/bindings/_path_finder directory.
The API for calling from cython is simply:
path_finder.load_nvidia_dynamic_library("nvJitLink") # or "nvvm"
-
load_nvidia_dynamic_library()first attempts to load the dynamic library using the system search features (rpath|LD_LIBRARY_PATH|PATH). If that succeeds, the handle to the library (a Pythoninton all platforms) is returned. -
Otherwise,
load_nvidia_dynamic_library()callsfind_nvidia_dynamic_library()to determine an absolute pathname for the dynamic library. Then it loads the library given that pathname. -
find_nvidia_dynamic_library()first searches for the library undersite-packages/nvidia, usingsys.pathto search for site-packages, in order. If that fails, it uses a clone ofnumba/cuda/cuda_paths.pyto search for the library.To pass all tests in the cuda-python CI, this trick is needed under Linux:
- https://github.com/rwgk/cuda-python/blob/eaeb8365d404076ed1a92f80172f5563ac8929e5/cuda_bindings/cuda/bindings/_path_finder/find_nvidia_dynamic_library.py#L72-L79
Here the last
/lib/is replaced with/lib64/or vice versa inget_cuda_paths()[name].infoand both are searched.- The search for a library stops as soon as there is a match.
-
@functools.cacheis used forload_nvidia_dynamic_library(name), therefore the involved search & load code is certain to be invoked only once per process, per library. -
numba/cuda/cuda_paths.pywas changed as little as possible, so that it is feasible to keep our copy in sync with the original while they both exist. The idea is to work towards usingcuda.bindings.path_finderfrom numba-cuda. (After that is achieved, the code undercuda/bindings/_path_findercan probably be refactored significantly.)
TODO
- The changes to the
.pyxfiles need to be backported to the upstream code generator.
Deferred (follow-on PRs)
path_finder.load_nvidia_dynamic_library("nvrtc")— This was attempted but backed out. Special handling ofnvrtc-builtins64_128.dllis required, to replace or emulate this existing code.
The second iteration of this PR description was:
These commits expand the experiment to adopt the entire numba/cuda/cuda_paths.py
- commit d31920ca07db52b2bd63810a017b19bf683a8ce1 — Copy from NVIDIA/numba-cuda#155 as-is (as of Tue Mar 18 09:29:19 2025 -0700)
- commit ed0ebb3117f4b3622b3b2d1ae7c80e90e3592800 — ruff format, no manual changes
- commit 0c5aca5da90a8d33382317c620bae2ba1cae5f7e — Minimal changes to replace external dependencies.
Example:
$ python tests/show_ecosystem_cuda_paths.py
nvvm: _env_path_tuple(by='CUDA_HOME', info='/usr/local/cuda/nvvm/lib64/libnvvm.so.4.0.0')
libdevice: _env_path_tuple(by='CUDA_HOME', info='/usr/local/cuda/nvvm/libdevice/libdevice.10.bc')
cudalib_dir: _env_path_tuple(by='CUDA_HOME', info='/usr/local/cuda/lib64')
static_cudalib_dir: _env_path_tuple(by='CUDA_HOME', info='/usr/local/cuda/lib64')
include_dir: _env_path_tuple(by='CUDA_INCLUDE_PATH Config Entry', info='/usr/local/cuda/include')
The first iteration of this PR description was:
Experiment related to #441, triggered by this comment (by @kkraus14).
Context: Potentially use this code from cuda_bindings/cuda/bindings/_internal/nvvm_linux.pyx
This PR: Stripped-down (and ruff'ed) copies of:
-
https://github.com/NVIDIA/numba-cuda/blob/bf487d78a40eea87f009d636882a5000a7524c95/numba_cuda/numba/cuda/cuda_paths.py
-
https://github.com/numba/numba/blob/f0d24824fcd6a454827e3c108882395d00befc04/numba/misc/findlib.py
Tested interactively with:
import cuda_paths
nvvm_path = cuda_paths.get_nvvm_path()
print(f"{nvvm_path=}")
Output:
nvvm_path=_env_path_tuple(by='System', info='/usr/local/cuda/nvvm/lib64/libnvvm.so.4.0.0')
Advantage of this approach: Battle-tested and time-tested.
Disadvantages: TBD