ompi icon indicating copy to clipboard operation
ompi copied to clipboard

Disable rpathing on build

Open Flamefire opened this issue 6 months ago • 15 comments
trafficstars

It seems that OpenMPIs build process adds rpath entries for all libraries linked.

This leads to problems when custom libraries from different locations are used. See https://github.com/ofiwg/libfabric/issues/11021 for details of the same issue

Are there any options to disable rpathing the libraries or at least exclude specific paths? E.g. for us it would be enough to exclude /usr/lib64

Flamefire avatar May 14 '25 15:05 Flamefire

Are you asking about Open MPI's build process or the wrapper compilers?

If you're asking about the wrapper compilers, you should be able to specify --disable-wrapper-rpath when building Open MPI. This affects the behavior of Open MPI's wrapper compilers. You can also edit the text config files of Open MPI's wrapper compilers, too.

If you're asking about Open MPI's build process, can you double check that Open MPI is adding -rpath for /usr/lib64? I could be wrong, but I thought that GNU Libtool only adds -rpath for directories that are not in default library search paths.

More specifically: IIRC, Libtool is who adds the -rpath entries -- Open MPI's build logic doesn't do anything specific there.

jsquyres avatar May 15 '25 13:05 jsquyres

This is about OpenMPIs build process. And it might indeed be libtool adding that.

GNU Libtool only adds -rpath for directories that are not in default library search paths.

I'd like to verify that. Do you have any further information on that like docs and what is considered "default library search paths"?

Flamefire avatar May 15 '25 14:05 Flamefire

You can run make V=1 and see the full build commands that Open MPI invokes. You can search through there for -rpath instances.

I did a quick build myself on an Alma 9 VM:

./configure --prefix=$HOME/bogus ...
make V=1 -j 32 |& tee out.txt

Looking in out.txt, I only saw rpath references to $HOME/bogus/lib, even though Open MPI's shared libraries were also linked against shared libraries in system-default locations (/usr/lib and the like).

Worst case, if Open MPI and/or libfabric's build processes put in rpath entries that aren't workable for you, you might try using chrpath(1) to change them after the fact.

jsquyres avatar May 15 '25 14:05 jsquyres

Yes I see it in the output. It looks like
libtool: link: gcc -Wall -O2 -DNDEBUG ... -pthread -Wl,-rpath -Wl,/usr/lib64

What might be relevant: I have a libxpmem.la file in /usr/lib64

Flamefire avatar May 15 '25 14:05 Flamefire

What might be relevant: I have a libxpmem.la file in /usr/lib64

That may very well be what's doing it. A .la file is a Libtool Archive, so Libtool may be reading that metadata file and adding an rpath to /usr/lib64.

There's not a lot we can do from the Open MPI side. Some options for you:

  1. Try chrpath(1).
  2. Remove libxpmem from /usr/lib64 (and presumably install it elsewhere).
  3. Or, if xpmem is not the problem, remove other inbox outdated software from /usr/lib64 (and only have newer versions installed elsewhere)
  4. Try removing /usr/lib64/libxpmem.la to see if Libtool then doesn't add an rpath for it.

Taking a step back: the real problem may be that you have both old and new versions of library X in multiple different directories. If you have other shared libraries that you're using in this process in the same directory as the old version of library X, then the linker may still end up finding the old version of library X regardless of rpath (based on other factors that you can't control, or would take a bunch of effort for you to control).

Remember: the linker in incredibly complicated. Every time I think I understand the linker, I ultimately discover that I apparently don't know jack about the linker...

jsquyres avatar May 15 '25 15:05 jsquyres

@Flamefire Is there a -L/usr/lib64 in the command line?

ggouaillardet avatar May 15 '25 15:05 ggouaillardet

@ggouaillardet No there is no such flag. But there is -L/usr/lib and IIRC the linker automatically searches the lib64 folder for any lib folder. The resulting command looks like:

gcc -DNDEBUG -Wl,-rpath -Wl,/software/PMIx/lib -Wl,--enable-new-dtags -o .libs/test_pvar_access test_pvar_access.o -L/software/UCC/lib64 -L/software/UCC/lib -L/software/PRRTE/lib64 -L/software/PRRTE/lib -L/software/PMIx/lib64 -L/software/PMIx/lib -L/software/libfabric/lib64 -L/software/libfabric/lib -L/software/UCX/lib64 -L/software/UCX/lib -L/software/libevent/lib64 -L/software/libevent/lib -L/software/hwloc/lib64 -L/software/hwloc/lib -L/software/zlib/lib64 -L/software/zlib/lib -L/software/pkgconf/lib64 -L/software/pkgconf/lib -L/software/GCCcore/lib64 -L/software/GCCcore/lib ../../ompi/.libs/libmpi.so -L/software/OpenSSL/lib64 -L/software/OpenSSL/lib -L/software/binutils/lib64 -L/software/binutils/lib -L/software/XZ/lib64 -L/software/XZ/lib -L/software/gettext/lib64 -L/software/gettext/lib -L/software/libpciaccess/lib64 -L/software/libpciaccess/lib -L/software/libxml2/lib64 -L/software/libxml2/lib -L/software/numactl/lib64 -L/software/numactl/lib -L/software/Bison/lib64 -L/software/Bison/lib -L/software/flex/lib64 -L/software/flex/lib -L/usr/lib /software/UCC/lib/libucc.so /build>/openmpi-5.0.3/opal/.libs/libopen-pal.so ../../opal/.libs/libopen-pal.so /software/libfabric/lib/libfabric.so -lrdmacm -libverbs -luuid /software/numactl/lib/libnuma.so /software/GCCcore/lib/../lib64/libatomic.so /software/UCX/lib/libucp.so /software/UCX/lib/libuct.so /software/UCX/lib/libucs.so /software/UCX/lib/libucm.so /software/binutils/lib/libbfd.so -liberty -lzstd /software/binutils/lib/libsframe.so /usr/lib64/libxpmem.so -lrt /software/PMIx/lib/libpmix.so -lmunge -lutil /software/libevent/lib/libevent_core.so /software/libevent/lib/libevent_pthreads.so /software/hwloc/lib/libhwloc.so -lpciaccess /software/libxml2/lib/libxml2.so -ldl -lz /software/XZ/lib/liblzma.so -lm -lpthread -pthread -Wl,-rpath -Wl,/software/OpenMPI/lib -Wl,-rpath -Wl,/software/UCC/lib -Wl,-rpath -Wl,/software/libfabric/lib -Wl,-rpath -Wl,/software/hwloc/lib -Wl,-rpath -Wl,/software/numactl/lib -Wl,-rpath -Wl,/software/GCCcore/lib/../lib64 -Wl,-rpath -Wl,/software/UCX/lib -Wl,-rpath -Wl,/software/binutils/lib -Wl,-rpath -Wl,/usr/lib64 -Wl,-rpath -Wl,/software/libevent/lib -Wl,-rpath -Wl,/software/PMIx/lib -Wl,-rpath -Wl,/software/libxml2/lib -Wl,-rpath -Wl,/software/XZ/lib

Flamefire avatar May 16 '25 13:05 Flamefire

Did you try the other things I recomended?

jsquyres avatar May 16 '25 17:05 jsquyres

@Flamefire Open MPI is supposed to filter out -L/usr/lib64 (that could imply -Wl,-rpath,/usr/lib64) so I wanted to double check there was no issue here.

I suggest you follow @jsquyres recommendation. -Wl,-rpath,/usr/lib64 is likely pulled from /usr/lib64/libxpmem.la, so try removing it first, or manually edit it.

ggouaillardet avatar May 17 '25 06:05 ggouaillardet

Open MPI is supposed to filter out -L/usr/lib64 (that could imply -Wl,-rpath,/usr/lib64) so I wanted to double check there was no issue here.

Not exactly sure where that filtering needs to happen. libxpmem.la only contains libdir='/usr/lib64' which is then picked up and converted to a -rpath flag.

This might be a libtool issue though. I opened an issue with them: https://github.com/autotools-mirror/libtool/issues/10

I cannot remove that file as I have no root permissions on that system. So the only workaround I found is passing --without-xpmem to OpenMPI configure (and libfabric before that)
Do you have any idea if that has a significant performance impact? The machine this is on is a VERY large shared memory machine (18TB w/ 672 cores).

Flamefire avatar May 20 '25 14:05 Flamefire

Have you tried chrpath(1)?

jsquyres avatar May 23 '25 22:05 jsquyres

That likely works although requires manual effort which I'd like to avoid.

So easiest might be to consistently not use that if there is no significant effect on e.g. performance.

Flamefire avatar May 24 '25 09:05 Flamefire

I thought that running chrpath(1) after running make install wouldn't be too much effort; you said you wanted to just remove one rpath entry (/usr/lib64).

Failing that, I think you should try removing /usr/lib64/libxpmem.la (or at least renaming it so that Libtool doesn't find it), as we have suggested a few times. You may have to ask someone with root perms to do it.

If you don't want to do the things that we're suggesting, I'm not sure where else to go.

jsquyres avatar May 24 '25 13:05 jsquyres

As for performance: there's a reason we wrote the software to use xpmem; 1-copy methods definitely help in a bunch of different situations (e.g., compared to btl/sm, which uses 2-copy methods). It very much depends on what your application is doing, what your specific environment is, how you are launching and running, ... etc. Without knowing what your application is doing, and without knowing how else you configured Open MPI (you didn't include the information requested by the github issue template), it's hard to say.

jsquyres avatar May 24 '25 13:05 jsquyres

It looks like this issue is expecting a response, but hasn't gotten one yet. If there are no responses in the next 2 weeks, we'll assume that the issue has been abandoned and will close it.

github-actions[bot] avatar Jun 08 '25 21:06 github-actions[bot]

Per the above comment, it has been a month with no reply on this issue. It looks like this issue has been abandoned.

I'm going to close this issue. If I'm wrong and this issue is not abandoned, please feel free to re-open it. Thank you!

github-actions[bot] avatar Jun 22 '25 21:06 github-actions[bot]