ompi icon indicating copy to clipboard operation
ompi copied to clipboard

Is the error "Error string: /lib64/libcuda.so.1: undefined symbol: cuIpcOpenMemHandle_v2 CUDA-aware support is disabled." due to unavailability of module nv_peer_mem or nvidia-peermem in the nvidia-driver

Open Dcn303 opened this issue 5 months ago • 2 comments

Hello everyone I am facing error shown below

An error occurred while trying to map in the address of a function. Function Name: cuIpcOpenMemHandle_v2 Error string: /lib64/libcuda.so.1: undefined symbol: cuIpcOpenMemHandle_v2 CUDA-aware support is disabled.

I was trying to benchmark cuda-aware openmpi-4.1.8 linked with cuda-aware ucx-1.19.x using OSU benchmark from https://mvapich.cse.ohio-state.edu/benchmarks/ Things I have done so far

  1. Build cuda-11.8 tool kit using gcc-8.2.0 then export its lib64 and bin
  2. Make ucx-1.19.x cuda-aware using the built cuda-11.8 then export its lib and bin (gcc-8.2.0 compiler used)
  3. Link openmpi-4.1.8 with cuda-11.8 making it cuda-aware and also link cuda-aware ucx-1.19.x (gcc-8.2.0 compiler used)
  4. Build the OSU benchmark with the built cuda-aware openmpi-4.1.8 linked with cuda-aware ucx-1.19.x and with the cuda-11.8 (gcc-8.2.0 compiler used)
  5. The OSU program picked to benchmark was osu_bw after the execution I am facing the above error

One thing I notice in the built cuda-aware ucx-1.19.x was it had a missing transport gdr_copy thought it has cuda_copy and cuda_ipc when checking for cuda support with "ucx_info -d | grep -i cuda" I heard that gdr_copy transport should also be there if ucx is cuda-aware and that this transport is dependent on module called nv_peer_mem or nvidia-peermem later I found out that my driver have a missing module call nv_peer_mem or nvidia-peermem Could this also be the reason for the above error i.e.

An error occurred while trying to map in the address of a function. Function Name: cuIpcOpenMemHandle_v2 Error string: /lib64/libcuda.so.1: undefined symbol: cuIpcOpenMemHandle_v2 CUDA-aware support is disabled.

Thanks a lot for taking time to read

Dcn303 avatar Jul 09 '25 12:07 Dcn303