ompi
ompi copied to clipboard
OpenMPI Version 5.0.1 seems to break compatibilities with CUDA and create memory allocation errors
Background information
I compiled Kitwares ParaView with MPI support and CUDA enabled. The last time i compiled was last week with the latest version of OpenMPI v4. Today, after an update, i got some warning messages regarding cuda when starting paraview. I do use NVIDIA OptiX to render my visualizations. If i try to do that my program crashes. I know this might be related to Nvidia/Kitware but i encountered the error after upgrading the openmpi version
What version of Open MPI are you using? (e.g., v4.1.6, v5.0.1, git branch name and hash, etc.)
v5.0.1-2
Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)
Via the official Arch Linux Repositories, i.e. via sudo pacman -S openmpi
If you are building/installing from a git clone, please copy-n-paste the output from git submodule status.
Please describe the system on which you are running
- Operating system/version: Arch Linux
- Computer hardware: i7 13th Gen, NVIDIA RTX 4070ti
- Network type: Local
Details of the problem
[max-pc:36548] mca_base_component_repository_open: unable to open mca_accelerator_cuda: /home/max/Apps/paraview/build/install/lib/libcuda.so.1: file too short (ignored)
[max-pc:36548] mca_base_component_repository_open: unable to open mca_rcache_gpusm: /home/max/Apps/paraview/build/install/lib/libcuda.so.1: file too short (ignored)
[max-pc:36548] mca_base_component_repository_open: unable to open mca_rcache_rgpusm: /home/max/Apps/paraview/build/install/lib/libcuda.so.1: file too short (ignored)
terminate called after throwing an instance of 'optix::Exception'
what(): Memory allocation failed (Details: Function "RTresult _rtContextLaunch2D(RTcontext, unsigned int, RTsize, RTsize)" caught exception: Memory allocation failed)
[max-pc:36548] *** Process received signal ***
[max-pc:36548] Signal: Aborted (6)
[max-pc:36548] Signal code: (-6)
[max-pc:36548] [ 0] /usr/lib/libc.so.6(+0x3c770)[0x7429567be770]
[max-pc:36548] [ 1] /usr/lib/libc.so.6(+0x8d32c)[0x74295680f32c]
[max-pc:36548] [ 2] /usr/lib/libc.so.6(gsignal+0x18)[0x7429567be6c8]
[max-pc:36548] [ 3] /usr/lib/libc.so.6(abort+0xd7)[0x7429567a64b8]
[max-pc:36548] [ 4] /usr/lib/libstdc++.so.6(+0x9ca6f)[0x74294d69ca6f]
[max-pc:36548] [ 5] /usr/lib/libstdc++.so.6(+0xb011c)[0x74294d6b011c]
[max-pc:36548] [ 6] /usr/lib/libstdc++.so.6(+0xb0189)[0x74294d6b0189]
[max-pc:36548] [ 7] /usr/lib/libstdc++.so.6(+0xb03ed)[0x74294d6b03ed]
[max-pc:36548] [ 8] /home/max/Apps/paraview/build/install/lib/libVisRTX.so(+0x539b8)[0x7428fa4539b8]
[max-pc:36548] [ 9] /home/max/Apps/paraview/build/install/lib/libvtkRenderingRayTracing-pv5.12.so.1(+0x38cd1)[0x74294b3e1cd1]
[max-pc:36548] [10] /home/max/Apps/paraview/build/install/lib/libvtkRenderingRayTracing-pv5.12.so.1(_ZN21vtkOSPRayRendererNode6RenderEb+0xa98)[0x74294b41a778]
[max-pc:36548] [11] /home/max/Apps/paraview/build/install/lib/libvtkRenderingRayTracing-pv5.12.so.1(_ZN13vtkOSPRayPass14RenderInternalEPK14vtkRenderState+0x287)[0x74294b4069c7]
[max-pc:36548] [12] /home/max/Apps/paraview/build/install/lib/libvtkRenderingOpenGL2-pv5.12.so.1(_ZN15vtkSequencePass6RenderEPK14vtkRenderState+0x66)[0x74295343b556]
[max-pc:36548] [13] /home/max/Apps/paraview/build/install/lib/libvtkRenderingOpenGL2-pv5.12.so.1(_ZN13vtkCameraPass6RenderEPK14vtkRenderState+0x29d)[0x7429532ba0fd]
[max-pc:36548] [14] /home/max/Apps/paraview/build/install/lib/libvtkRenderingOpenGL2-pv5.12.so.1(_ZN13vtkCameraPass6RenderEPK14vtkRenderState+0x29d)[0x7429532ba0fd]
[max-pc:36548] [15] /home/max/Apps/paraview/build/install/lib/libvtkRenderingOpenGL2-pv5.12.so.1(_ZN17vtkOpenGLRenderer12DeviceRenderEv+0xae)[0x7429533c252e]
[max-pc:36548] [16] /home/max/Apps/paraview/build/install/lib/libvtkRenderingCore-pv5.12.so.1(_ZN11vtkRenderer6RenderEv+0x975)[0x7429529ae755]
[max-pc:36548] [17] /home/max/Apps/paraview/build/install/lib/libvtkRenderingCore-pv5.12.so.1(_ZN21vtkRendererCollection6RenderEv+0xa9)[0x7429529b4c59]
[max-pc:36548] [18] /home/max/Apps/paraview/build/install/lib/libvtkRenderingCore-pv5.12.so.1(_ZN15vtkRenderWindow14DoStereoRenderEv+0x1cd)[0x7429529a187d]
[max-pc:36548] [19] /home/max/Apps/paraview/build/install/lib/libvtkRenderingCore-pv5.12.so.1(_ZN15vtkRenderWindow6RenderEv+0x1c2)[0x7429529a1c62]
[max-pc:36548] [20] /home/max/Apps/paraview/build/install/lib/libvtkRenderingOpenGL2-pv5.12.so.1(_ZN21vtkOpenGLRenderWindow6RenderEv+0xb5)[0x7429533bd185]
[max-pc:36548] [21] /home/max/Apps/paraview/build/install/lib/libvtkRenderingOpenGL2-pv5.12.so.1(_ZN28vtkGenericOpenGLRenderWindow6RenderEv+0x111)[0x7429532e51a1]
[max-pc:36548] [22] /home/max/Apps/paraview/build/install/lib/libvtkRemotingViews-pv5.12.so.1(_ZN15vtkPVRenderView6RenderEbb+0x96e)[0x74294ba223ee]
[max-pc:36548] [23] /home/max/Apps/paraview/build/install/lib/libvtkRemotingViews-pv5.12.so.1(_ZN15vtkPVRenderView11StillRenderEv+0x72)[0x74294ba142c2]
[max-pc:36548] [24] /home/max/Apps/paraview/build/install/lib/libvtkRemotingApplication-pv5.12.so.1(_Z22vtkPVRenderViewCommandP26vtkClientServerInterpreterP13vtkObjectBasePKcRK21vtkClientServerStreamRS5_Pv+0x1b70)[0x74294c25c760]
[max-pc:36548] [25] /home/max/Apps/paraview/build/install/lib/libvtkRemotingClientServerStream-pv5.12.so.1(_ZN26vtkClientServerInterpreter20ProcessCommandInvokeERK21vtkClientServerStreami+0x4dd)[0x7429553da42d]
[max-pc:36548] [26] /home/max/Apps/paraview/build/install/lib/libvtkRemotingClientServerStream-pv5.12.so.1(_ZN26vtkClientServerInterpreter17ProcessOneMessageERK21vtkClientServerStreami+0xbe)[0x7429553da54e]
[max-pc:36548] [27] /home/max/Apps/paraview/build/install/lib/libvtkRemotingClientServerStream-pv5.12.so.1(_ZN26vtkClientServerInterpreter13ProcessStreamERK21vtkClientServerStream+0x1d)[0x7429553da9ed]
[max-pc:36548] [28] /home/max/Apps/paraview/build/install/lib/libvtkRemotingServerManager-pv5.12.so.1(_ZN16vtkPVSessionCore21ExecuteStreamInternalERK21vtkClientServerStreamb+0xfa)[0x742953ee27fa]
[max-pc:36548] [29] /home/max/Apps/paraview/build/install/lib/libvtkRemotingServerManager-pv5.12.so.1(_ZN16vtkPVSessionBase13ExecuteStreamEjRK21vtkClientServerStreamb+0x35)[0x742953ee0665]
[max-pc:36548] *** End of error message ***
Is there a way to disable the warnings about missing libcuda.so.1? I'm also using the Arch package which has CUDA enabled but works without it. So far, I've used OMPI_MCA_opal_warn_on_missing_libcuda=0 but that does not help for these warnings.
@maxawake just curious, but could you try rebuilding with a custom built open-mpi?
Is there a way to disable the warnings about missing
libcuda.so.1?
It does not seem so :frowning: https://github.com/open-mpi/ompi/issues/11877#issuecomment-1901909275
@maxawake In your case openmpi is trying to load /home/max/Apps/paraview/build/install/lib/libcuda.so.1 instead of taking libcuda.so.1 from a system path. What is your environment, especially LD_LIBRARY_PATH? Why do you need a custom paraview build anyway? The segfault also seems to come from paraview rather than openmpi itself.
@janjust Well, sure but to be honest i would like to postpone it to march, because i need to finish my thesis and i don't want to mess around with my system too much right now. What configuration would you suggest? Just the same as in the Arch PKGBUILD file?
@lahwaacz Yes, paraview is quite picky in my experience about versions. This is why i switched to the superbuild option, where basically everything is build from scratch, except Qt5, MPI and HDF5, which needs to be installed on the system. LD_LIBRARY_PATH is set to /home/max/Apps/paraview/build/install/lib. This is where the superbuild installs and links all necessary libraries. In principle i could set the install location also to /usr/lib but that would completely screw my system (did it once, don't want to do this again). I do need a custom paraview build anyway, because i am currently developing a lot of different VTK filters and Paraview plugins. From the Paraview Plugin Howto page:
To create a plugin, one must have their own build of ParaView. Binaries downloaded from www.paraview.org do not include necessary header files or import libraries (where applicable) for compiling plugins.
And yes, i know that it rather comes from paraview, or the more more precise, from Nvidias OptiX/VisRTX libraries. But the only variable that changed was openmpi, which, as explained before, i could not specifically build for my version of paraview and i had to rely on my systems version. Probably its the classical "We said tell your developers to conform to new API standards and throw away old ones which were deprecated 20 years ago", and paraview uses quite old versions (e.g. OptiX v6), but as i initially searched for the source of these warnings and errors, i came across other openmpi realted issues. Probably i am going to annoy the paraview devs with his issue as well.
What i now basically did as a workaround is to simply downgrade openmpi to v4 and all warnings and errors are gone. Building with v5 is not possible right now because cmake is not finding any suitable versions. So i hope Kitware is changing their implementation soon.
@maxawake Thanks for explaining your use case. If I understand correctly, you just upgraded openmpi and did not rebuild paraview, which resulted in the segfault? That's not too unexpected as v4 to v5 is a major upgrade so you'd need to rebuild. FWIW, paraview in Arch Linux repositories was built fine with openmpi 5.