mpich icon indicating copy to clipboard operation
mpich copied to clipboard

Building with NAG+gcc pulls in unsupported -pthread flag

Open mathomp4 opened this issue 6 months ago • 7 comments

This issue is dealing with NAG Fortran (with GCC for C/C++) and MPICH 4.3.1.

In #7346, I noted that once the LOGICAL issue there was fixed (fixed in #7551 by @hzhou), I still failed with:

make[2]: Entering directory '/ford1/share/gmao_SIteam/MPI/src/mpich-4.3.0/build-nag-7.2.13'
  GEN      lib/libmpifort.la
libtool: warning: '/ford1/local/gcc/gcc-12.1.0/lib64/libatomic.la' seems to be moved
NAG Fortran Compiler Release 7.2(Shin-Urayasu) Build 7213
Option error: Unrecognised option -pthread
make[2]: *** [Makefile:13010: lib/libmpifort.la] Error 2
make[2]: Leaving directory '/ford1/share/gmao_SIteam/MPI/src/mpich-4.3.0/build-nag-7.2.13'
make[1]: *** [Makefile:40263: all-recursive] Error 1
make[1]: Leaving directory '/ford1/share/gmao_SIteam/MPI/src/mpich-4.3.0/build-nag-7.2.13'
make: *** [Makefile:10489: all] Error 2

As the error states, nagfor does not recognize -pthread and so crash.

I usually build with:

../configure CC=gcc CXX=g++ FC=nagfor MPICHLIB_FFLAGS=-mismatch -fpp MPICHLIB_FCFLAGS=-mismatch -fpp --enable-f08 CFLAGS=-I/ford1/share/gmao_SIteam/nag/7.2.36/lib/NAG_Fortran --prefix=/ford1/share/gmao_SIteam/MPI/mpich/4.3.1/nag-7.2.36

and I've also tried adding --with-hwloc=embedded --with-libfabric=embedded --with-ucx=embedded --with-yaksa=embedded in hopes/thoughts that maybe the issue was some system library being brought in.

I'm attaching a verbose logfile:

make.nag-7.2.36.log.gz


NOTE: I encountered something similar with Open MPI (see https://github.com/open-mpi/ompi/issues/12413) but @ggouaillardet traced that to, I think, libevent leaking things in. That was one reason I tried all the --with-foo=embedded bits above. Just in case some external library was feeding in the -pthread

mathomp4 avatar Aug 29 '25 15:08 mathomp4

Looks like it is pulled from libfabric, which pulls in -lefa -lnl-3 -lnl-route-3 -libverbs -luuid -lnuma. I guess one of these library are installed with an .la file which contains -pthread from its original build environment.

hzhou avatar Aug 29 '25 16:08 hzhou

@hzhou I am building this on a machine that is a single node, so all my MPI is just intranode. To that end, I have to imagine I have no need for verbs support, say.

Is there a way to build MPICH to avoid much of that? --with-device=foo Something with netmod?

mathomp4 avatar Aug 29 '25 22:08 mathomp4

You can disable most of the libfabric providers by pass config options such as --disable-efa --disable-verbs --disable-opx --disable-psm3 --disable-psm2 --disable-lnx. Let's try that to see if it helps. If it does, we can further figure out which provider is the offender.

hzhou avatar Aug 31 '25 04:08 hzhou

@hzhou I just tried your options and:

configure: WARNING: unrecognized options: --disable-efa, --disable-verbs, --disable-opx, --disable-psm3, --disable-psm2, --disable-lnx

and, well, that seems to be all your options. 🙁 And it did fail the same way.

mathomp4 avatar Sep 03 '25 15:09 mathomp4

Maybe related but I remember a previous build issue with NAG where I suspected that flags would get reordered at the link stage. Not confident if its a libtool or nagfor problem, but it meant that flags intended for the linker would mistakenly get interpreted by the compiler. https://github.com/pmodels/mpich/issues/4358#issuecomment-1281265217

raffenet avatar Sep 16 '25 18:09 raffenet

@mathomp4 if you are able could you try out the most recent release candidate can confirm if this issue still exists or not? https://github.com/pmodels/mpich/releases/tag/v4.3.2rc2

raffenet avatar Sep 30 '25 16:09 raffenet

FWIW, in Open MPI we apply some patches to fix/enhance autotools. For NAG compilers, we apply https://github.com/open-mpi/ompi/blob/main/config/ltmain_nag_pthread.diff on config/ltmain.sh by the end of autogen.pl We also patch the generated configure to support shared libraries (not sure that is still needed though) https://github.com/open-mpi/ompi/blob/41a68b59212d7c3341eb0572d8cd4deb66bbf896/autogen.pl#L963-L998 (note these patches/regexes may have to be adapted depending on which autotools versions you are using)

ggouaillardet avatar Oct 01 '25 00:10 ggouaillardet