Build fails with nvshmem installed as RPM in /usr/lib64 — unable to create wheel
Description
When trying to build this package against nvshmem provided as an RPM (installed under /usr/lib64 and /usr/include/nvshmem_${CUDA_MAJOR_VERSION}), the wheel build fails due to incorrect linking flags and library discovery issues.
Currently, the setup.py assumes local paths like ${nvshmem_dir}/lib and ${nvshmem_dir}/include. This breaks in RPM-based environments where nvshmem is installed system-wide.
Issues observed
-
Link flags not generic
- Existing
extra_link_argsused explicit-l:libnvshmem_host.soand-l:libnvshmem_device.a. - These are brittle because they rely on filenames instead of sonames (
-lnvshmem_host,-lnvshmem_device). - Also,
nvshmem_bootstrap_uid.sowas incorrectly linked, but not actually required.
- Existing
-
RPATH handling
- Previously only
${nvshmem_dir}/libwas added to-rpath. - On RPM-based installs, libraries are under
/usr/lib64/nvshmem/${CUDA_MAJOR_VERSION}and/usr/lib64, requiring explicit rpath.
- Previously only
-
Device linking
nvcc_dlinkwas missing system paths fornvshmem_device, causing unresolved references during device code linking.
-
Wheel creation fails
- Since the build cannot resolve
nvshmem_hostandnvshmem_deviceproperly,pip wheel .fails to produce a wheel on systems wherenvshmemis installed as an RPM. - Error messages include missing
libnvshmem_host.so.3and unresolved device symbols.
- Since the build cannot resolve
Changes needed
-
Use system include and library directories:
include_dirs.extend(['/usr/include', f'/usr/include/nvshmem_{os.getenv("CUDA_MAJOR_VERSION")}']) library_dirs.extend(['/usr/lib64', f'/usr/lib64/nvshmem/{os.getenv("CUDA_MAJOR_VERSION")}']) -
Update linker flags to use sonames instead of filenames:
extra_link_args.extend([ '-lnvshmem', '-Wl,--no-as-needed', '-lnvshmem_host', '-lnvshmem_device', f'-Wl,-rpath,/usr/lib64/nvshmem/{os.getenv("CUDA_MAJOR_VERSION")}:/usr/lib64' ]) -
Update device linking with
nvcc:nvcc_dlink.extend([ '-dlink', '-L/usr/lib64', f'-L/usr/lib64/nvshmem/{os.getenv("CUDA_MAJOR_VERSION")}', '-lnvshmem_device' ])
Expected outcome
- Build succeeds when
nvshmemis installed from RPM. - Wheel (
.whl) can be created and installed in a clean environment. - Linker resolves
libnvshmem_host.so.3andlibnvshmem_devicedynamically without hardcoding filenames.
To make the build process more portable and user-configurable, I propose that we update setup.py to use environment variables for discovering NVSHMEM paths and linker flags.
We could adopt an approach that prioritizes explicit environment variables, with a fallback mechanism for backward compatibility.
New Environment Variables:
- NVSHMEM_INCLUDE_PATH: An os.pathsep-separated list of include directories for the NVSHMEM headers.
- NVSHMEM_LIBRARY_PATH: An os.pathsep-separated list of library directories for the NVSHMEM libraries.
- NVSHMEM_LDFLAGS: A string containing all necessary linker flags (e.g., -lnvshmem -Wl,--no-as-needed -lnvshmem_host -lnvshmem_device -Wl,-rpath,...).
Implement Fallback Logic:
- To maintain compatibility with existing setups, if NVSHMEM_INCLUDE_PATH is not defined but NVSHMEM_DIR is, set NVSHMEM_INCLUDE_PATH = f"{NVSHMEM_DIR}/include".
- Similarly, if NVSHMEM_LIBRARY_PATH is not defined but NVSHMEM_DIR is, set NVSHMEM_LIBRARY_PATH = f"{NVSHMEM_DIR}/lib".
Conditional Compilation:
The NVSHMEM extension should only be built if both NVSHMEM_INCLUDE_PATH and NVSHMEM_LIBRARY_PATH are successfully resolved (either directly or via the NVSHMEM_DIR fallback). This prevents build failures when NVSHMEM is not available or configured.
Note
I've done similar work in flashinfer-python: https://github.com/flashinfer-ai/flashinfer/commit/ce68e1d0cc8a69da4ead85a5280a183f3e2a5a00