ompi cuda-aware build fails cuda runtime linking with undefined symbol cuIpcOpenMemHandle

cuda-aware build fails cuda runtime linking with undefined symbol cuIpcOpenMemHandle_v2

Open mlohry opened this issue 4 years ago • 5 comments

Thank you for taking the time to submit an issue!

Background information

What version of Open MPI are you using? (e.g., v3.0.5, v4.0.2, git branch name and hash, etc.)

v4.1.1 release

Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)

UCX installed using their provided ubuntu 20.04 deb files
openmpi configure:

./configure --enable-fast=all,O3 --prefix=/opt/openmpi-4.1.1 --with-cuda=/usr/local/cuda --with-ucx=/usr/

Please describe the system on which you are running

Operating system/version: Ubuntu 20.04
Computer hardware: Ryzen 9, GeForce GTX Titan Black, driver version 450.51.05, cuda version 11.0
Network type: N/A

Details of the problem

I am attempting to build "cuda-aware" OpenMPI as instructed in the FAQ, using the configuration mentioned above. When I compile a simple hello world program, even one without any CUDA code, on execution I get the following message:

shell$ mpirun -np 2 ./a.out
--------------------------------------------------------------------------
An error occurred while trying to map in the address of a function.
  Function Name: cuIpcOpenMemHandle_v2
  Error string:  /lib/x86_64-linux-gnu/libcuda.so.1: undefined symbol: cuIpcOpenMemHandle_v2
CUDA-aware support is disabled.
--------------------------------------------------------------------------
Hello world from processor hostname, rank 0 out of 2 processors
Hello world from processor hostname, rank 1 out of 2 processors
[hostname:1070999] 1 more process has sent help message help-mpi-common-cuda.txt / dlsym failed
[hostname:1070999] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages

What am I doing wrong when building cuda-aware OpenMPI?

Nov 22 '21 01:11 mlohry

ompi ompi copied to clipboard

cuda-aware build fails cuda runtime linking with undefined symbol cuIpcOpenMemHandle_v2

Background information

What version of Open MPI are you using? (e.g., v3.0.5, v4.0.2, git branch name and hash, etc.)

Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)

Please describe the system on which you are running

Details of the problem

ompi
ompi copied to clipboard