ompi
ompi copied to clipboard
Disable rpathing on build
It seems that OpenMPIs build process adds rpath entries for all libraries linked.
This leads to problems when custom libraries from different locations are used. See https://github.com/ofiwg/libfabric/issues/11021 for details of the same issue
Are there any options to disable rpathing the libraries or at least exclude specific paths? E.g. for us it would be enough to exclude /usr/lib64
Are you asking about Open MPI's build process or the wrapper compilers?
If you're asking about the wrapper compilers, you should be able to specify --disable-wrapper-rpath when building Open MPI. This affects the behavior of Open MPI's wrapper compilers. You can also edit the text config files of Open MPI's wrapper compilers, too.
If you're asking about Open MPI's build process, can you double check that Open MPI is adding -rpath for /usr/lib64? I could be wrong, but I thought that GNU Libtool only adds -rpath for directories that are not in default library search paths.
More specifically: IIRC, Libtool is who adds the -rpath entries -- Open MPI's build logic doesn't do anything specific there.
This is about OpenMPIs build process. And it might indeed be libtool adding that.
GNU Libtool only adds -rpath for directories that are not in default library search paths.
I'd like to verify that. Do you have any further information on that like docs and what is considered "default library search paths"?
You can run make V=1 and see the full build commands that Open MPI invokes. You can search through there for -rpath instances.
I did a quick build myself on an Alma 9 VM:
./configure --prefix=$HOME/bogus ...
make V=1 -j 32 |& tee out.txt
Looking in out.txt, I only saw rpath references to $HOME/bogus/lib, even though Open MPI's shared libraries were also linked against shared libraries in system-default locations (/usr/lib and the like).
Worst case, if Open MPI and/or libfabric's build processes put in rpath entries that aren't workable for you, you might try using chrpath(1) to change them after the fact.
Yes I see it in the output. It looks like
libtool: link: gcc -Wall -O2 -DNDEBUG ... -pthread -Wl,-rpath -Wl,/usr/lib64
What might be relevant: I have a libxpmem.la file in /usr/lib64
What might be relevant: I have a
libxpmem.lafile in/usr/lib64
That may very well be what's doing it. A .la file is a Libtool Archive, so Libtool may be reading that metadata file and adding an rpath to /usr/lib64.
There's not a lot we can do from the Open MPI side. Some options for you:
- Try
chrpath(1). - Remove libxpmem from
/usr/lib64(and presumably install it elsewhere). - Or, if xpmem is not the problem, remove other inbox outdated software from
/usr/lib64(and only have newer versions installed elsewhere) - Try removing
/usr/lib64/libxpmem.lato see if Libtool then doesn't add an rpath for it.
Taking a step back: the real problem may be that you have both old and new versions of library X in multiple different directories. If you have other shared libraries that you're using in this process in the same directory as the old version of library X, then the linker may still end up finding the old version of library X regardless of rpath (based on other factors that you can't control, or would take a bunch of effort for you to control).
Remember: the linker in incredibly complicated. Every time I think I understand the linker, I ultimately discover that I apparently don't know jack about the linker...
@Flamefire Is there a -L/usr/lib64 in the command line?
@ggouaillardet No there is no such flag. But there is -L/usr/lib and IIRC the linker automatically searches the lib64 folder for any lib folder. The resulting command looks like:
gcc -DNDEBUG -Wl,-rpath -Wl,/software/PMIx/lib -Wl,--enable-new-dtags -o .libs/test_pvar_access test_pvar_access.o -L/software/UCC/lib64 -L/software/UCC/lib -L/software/PRRTE/lib64 -L/software/PRRTE/lib -L/software/PMIx/lib64 -L/software/PMIx/lib -L/software/libfabric/lib64 -L/software/libfabric/lib -L/software/UCX/lib64 -L/software/UCX/lib -L/software/libevent/lib64 -L/software/libevent/lib -L/software/hwloc/lib64 -L/software/hwloc/lib -L/software/zlib/lib64 -L/software/zlib/lib -L/software/pkgconf/lib64 -L/software/pkgconf/lib -L/software/GCCcore/lib64 -L/software/GCCcore/lib ../../ompi/.libs/libmpi.so -L/software/OpenSSL/lib64 -L/software/OpenSSL/lib -L/software/binutils/lib64 -L/software/binutils/lib -L/software/XZ/lib64 -L/software/XZ/lib -L/software/gettext/lib64 -L/software/gettext/lib -L/software/libpciaccess/lib64 -L/software/libpciaccess/lib -L/software/libxml2/lib64 -L/software/libxml2/lib -L/software/numactl/lib64 -L/software/numactl/lib -L/software/Bison/lib64 -L/software/Bison/lib -L/software/flex/lib64 -L/software/flex/lib -L/usr/lib /software/UCC/lib/libucc.so /build>/openmpi-5.0.3/opal/.libs/libopen-pal.so ../../opal/.libs/libopen-pal.so /software/libfabric/lib/libfabric.so -lrdmacm -libverbs -luuid /software/numactl/lib/libnuma.so /software/GCCcore/lib/../lib64/libatomic.so /software/UCX/lib/libucp.so /software/UCX/lib/libuct.so /software/UCX/lib/libucs.so /software/UCX/lib/libucm.so /software/binutils/lib/libbfd.so -liberty -lzstd /software/binutils/lib/libsframe.so /usr/lib64/libxpmem.so -lrt /software/PMIx/lib/libpmix.so -lmunge -lutil /software/libevent/lib/libevent_core.so /software/libevent/lib/libevent_pthreads.so /software/hwloc/lib/libhwloc.so -lpciaccess /software/libxml2/lib/libxml2.so -ldl -lz /software/XZ/lib/liblzma.so -lm -lpthread -pthread -Wl,-rpath -Wl,/software/OpenMPI/lib -Wl,-rpath -Wl,/software/UCC/lib -Wl,-rpath -Wl,/software/libfabric/lib -Wl,-rpath -Wl,/software/hwloc/lib -Wl,-rpath -Wl,/software/numactl/lib -Wl,-rpath -Wl,/software/GCCcore/lib/../lib64 -Wl,-rpath -Wl,/software/UCX/lib -Wl,-rpath -Wl,/software/binutils/lib -Wl,-rpath -Wl,/usr/lib64 -Wl,-rpath -Wl,/software/libevent/lib -Wl,-rpath -Wl,/software/PMIx/lib -Wl,-rpath -Wl,/software/libxml2/lib -Wl,-rpath -Wl,/software/XZ/lib
Did you try the other things I recomended?
@Flamefire Open MPI is supposed to filter out -L/usr/lib64 (that could imply -Wl,-rpath,/usr/lib64) so I wanted to double check there was no issue here.
I suggest you follow @jsquyres recommendation. -Wl,-rpath,/usr/lib64 is likely pulled from /usr/lib64/libxpmem.la, so try removing it first, or manually edit it.
Open MPI is supposed to filter out -L/usr/lib64 (that could imply -Wl,-rpath,/usr/lib64) so I wanted to double check there was no issue here.
Not exactly sure where that filtering needs to happen. libxpmem.la only contains libdir='/usr/lib64' which is then picked up and converted to a -rpath flag.
This might be a libtool issue though. I opened an issue with them: https://github.com/autotools-mirror/libtool/issues/10
I cannot remove that file as I have no root permissions on that system. So the only workaround I found is passing --without-xpmem to OpenMPI configure (and libfabric before that)
Do you have any idea if that has a significant performance impact? The machine this is on is a VERY large shared memory machine (18TB w/ 672 cores).
Have you tried chrpath(1)?
That likely works although requires manual effort which I'd like to avoid.
So easiest might be to consistently not use that if there is no significant effect on e.g. performance.
I thought that running chrpath(1) after running make install wouldn't be too much effort; you said you wanted to just remove one rpath entry (/usr/lib64).
Failing that, I think you should try removing /usr/lib64/libxpmem.la (or at least renaming it so that Libtool doesn't find it), as we have suggested a few times. You may have to ask someone with root perms to do it.
If you don't want to do the things that we're suggesting, I'm not sure where else to go.
As for performance: there's a reason we wrote the software to use xpmem; 1-copy methods definitely help in a bunch of different situations (e.g., compared to btl/sm, which uses 2-copy methods). It very much depends on what your application is doing, what your specific environment is, how you are launching and running, ... etc. Without knowing what your application is doing, and without knowing how else you configured Open MPI (you didn't include the information requested by the github issue template), it's hard to say.
It looks like this issue is expecting a response, but hasn't gotten one yet. If there are no responses in the next 2 weeks, we'll assume that the issue has been abandoned and will close it.
Per the above comment, it has been a month with no reply on this issue. It looks like this issue has been abandoned.
I'm going to close this issue. If I'm wrong and this issue is not abandoned, please feel free to re-open it. Thank you!