ompi icon indicating copy to clipboard operation
ompi copied to clipboard

Apple linker does not accept `-commons use_dylibs` flag anymore

Open fxcoudert opened this issue 1 year ago • 22 comments

Background information

What version of Open MPI are you using? 5.0.2

Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)

Installed from released sources as part of Homebrew build (https://github.com/Homebrew/homebrew-core/pull/166807)

Please describe the system on which you are running

  • Operating system/version: macOS 14.4
  • Computer hardware: Apple M1
  • Network type: not relevant

Details of the problem

Compiling any Fortran MPI code with mpifort hellof.f90 -o hellof with Xcode 15.3 gives:

  ld: warning: -commons use_dylibs is no longer supported, using error treatment instead
  ld: common symbol '_mpi_fortran_argv_null_' from '/private/tmp/cclH6ubZ.o' conflicts with definition from dylib '_mpi_fortran_argv_null_' from '/opt/homebrew/Cellar/open-mpi/5.0.2_1/lib/libmpi_usempi_ignore_tkr.40.dylib'
  collect2: error: ld returned 1 exit status

That is because -commons use_dylibs is now ignored (giving the warning), which leads to the symbol being rejected as defined twice.

fxcoudert avatar Mar 21 '24 17:03 fxcoudert

I have reported the regression (compared to Xcode 14 and earlier linkers) to Apple as FB13194355.

fxcoudert avatar Mar 21 '24 17:03 fxcoudert

similar problem I have encountered when I was trying to install super-dist package using spack and using [email protected]

ld: warning: ignoring duplicate libraries: '-lemutls_w', '-lgcc', '-lgfortran', '-lmpi', '-lmpi_mpifh', '-lmpi_usempi_ignore_t
             kr', '-lmpi_usempif08', '-lquadmath'
             
ld: warning: -commons use_dylibs is no longer supported, using error treatment instead

ld: warning: ignoring duplicate libraries: '-lemutls_w', '-lgcc', '-lgfortran', '-lmpi', '-lmpi_mpifh', '-lmpi_usempi_ignore_t
             kr', '-lmpi_usempif08', '-lquadmath'
             
ld: common symbol '_mpi_fortran_argv_null_' from '/private/var/folders/pd/9hc154y94k9_t_rb4lw0vcw00000gn/T/neoh/spack-stage/sp
             ack-stage-superlu-dist-8.2.1-h3rdwb66k3wb4s6gjglymknnv4xor3nf/spack-build-h3rdwb6/FORTRAN/CMakeFiles/f_pddrive.dir/f_pddrive.F
             90.o' conflicts with definition from dylib '_mpi_fortran_argv_null_' from '/Users/neoh/spack/opt/spack/darwin-sonoma-m2/apple-
             clang-15.0.0/openmpi-5.0.2-ja66vwemf6adckixrk6njhmaglqwck6v/lib/libmpi_usempi_ignore_tkr.40.dylib'

Does this warning treated differently in earlier versions of apple-clang?

neoh54 avatar Mar 26 '24 05:03 neoh54

It might be worth trying LDFLAGS=-ld_classic

Not sure if this is related to this issue thougth

If this fixes the issue, all the credit should go to @jeffhammond https://twitter.com/science_dot/status/1772314603692626154

ggouaillardet avatar Mar 26 '24 07:03 ggouaillardet

Is Open MPI getting these flags from GNU Libtool? I.e., is this actually a Libtool issue?

jsquyres avatar Mar 26 '24 17:03 jsquyres

Is Open MPI getting these flags from GNU Libtool? I.e., is this actually a Libtool issue?

No x 2:

https://github.com/open-mpi/ompi/blob/984944d9d9f3f6eda199fe6a040d65070d3a0745/config/ompi_setup_fc.m4#L236

fxcoudert avatar Mar 26 '24 17:03 fxcoudert

FYI I have verified the following works with XCode 15.3 on Sonoma 14.4, which is the workaround Apple gave me.

I also confirmed it works when gfortran is used to initiate the linker, if -Wl,-ld_classic -Wl,-commons,use_dylibs is used.

% gcc -fPIC -shared extern2.c -o libxxx.so && \
  gfortran -c extern.F90 && ld extern.o libxxx.so \
  -L/opt/homebrew/Cellar/gcc/13.2.0/lib/gcc/current/ -lgfortran \
  -o extern -ld_classic -commons use_dylibs && \
  ./extern ; nm extern | grep MPI
ld: warning: -commons use_dylibs is no longer supported, using error treatment instead
MPIR_F08_MPI_IN_PLACE=0 &MPIR_F08_MPI_IN_PLACE=0x102818000 &MPIR_F08_MPI_IN_PLACE=4337008640
 LOC(MPI_IN_PLACE)=           4337008640
 LOC(buf)=           6134510240
sendbuf=0x102818000, sendbuf=4337008640
sendbuf is MPI_IN_PLACE? yes
recvbuf=0x16da532a0, recvbuf=6134510240
*count=1, *datatype=2, *op=3, *comm=4
         911
                 U _MPIR_F08_MPI_IN_PLACE
                 U _MPI_Allreduce
// extern2.c
#include <stdio.h>
#include <stdint.h>

int MPIR_F08_MPI_IN_PLACE;

void p(void)
{
    printf("MPIR_F08_MPI_IN_PLACE=%d &MPIR_F08_MPI_IN_PLACE=%p &MPIR_F08_MPI_IN_PLACE=%zu\n",
            MPIR_F08_MPI_IN_PLACE,   &MPIR_F08_MPI_IN_PLACE,   (intptr_t)&MPIR_F08_MPI_IN_PLACE);
}

void MPI_Allreduce(void ** sendbuf, void ** recvbuf,
                   int * count, int * datatype,
                   int * op, int * comm, int * ierror)
{
    printf("sendbuf=%p, sendbuf=%zu\n", sendbuf, (intptr_t)sendbuf);
    printf("sendbuf is MPI_IN_PLACE? %s\n",
           (intptr_t)sendbuf==(intptr_t)&MPIR_F08_MPI_IN_PLACE ? "yes" : "no");
    printf("recvbuf=%p, recvbuf=%zu\n", recvbuf, (intptr_t)recvbuf);
    printf("*count=%d, *datatype=%d, *op=%d, *comm=%d\n",
            *count, *datatype, *op, *comm);
    *ierror = 911;
}
! extern.F90
module mpi
    use iso_c_binding
    !type(c_ptr), bind(C,name="MPI_F_IN_PLACE") :: MPI_IN_PLACE
    integer(c_int), bind(C, name="MPIR_F08_MPI_IN_PLACE"), target :: MPI_IN_PLACE
    interface
        subroutine p() bind(C,name="p")
        end subroutine
    end interface
    interface
        SUBROUTINE MPI_ALLREDUCE(SENDBUF, RECVBUF, COUNT, DATATYPE, OP, COMM, IERROR) &
                   bind(C,name="MPI_Allreduce")
            use iso_c_binding
            import :: MPI_IN_PLACE
            !DEC$ ATTRIBUTES NO_ARG_CHECK :: sendbuf,recvbuf
            !GCC$ ATTRIBUTES NO_ARG_CHECK :: sendbuf,recvbuf
            !$PRAGMA IGNORE_TKR sendbuf,recvbuf
            !DIR$ IGNORE_TKR sendbuf,recvbuf
            !IBM* IGNORE_TKR sendbuf,recvbuf
            INTEGER(kind=c_int) :: SENDBUF(*), RECVBUF(*)
            INTEGER(kind=c_int) :: COUNT, DATATYPE, OP, COMM, IERROR
        END SUBROUTINE MPI_ALLREDUCE
    end interface
end module mpi

program main
    use mpi
    implicit none
    real :: buf(100)
    integer :: ierror
    call p
    buf = 17
    print*,'LOC(MPI_IN_PLACE)=',LOC(MPI_IN_PLACE)
    print*,'LOC(buf)=',LOC(buf)
    call MPI_ALLREDUCE(MPI_IN_PLACE,buf,1,2,3,4,ierror)
    print*,ierror
end program main

jeffhammond avatar Mar 27 '24 17:03 jeffhammond

I am trying to do the same for Homebrew: https://github.com/Homebrew/homebrew-core/pull/166807

My original analysis was that -ld_classic was not effective anymore, because of the weird warning. But in spite of the warning, the classic linker can still be called that way.

fxcoudert avatar Mar 27 '24 18:03 fxcoudert

this worked for me:

spack install superlu-dist ldflags=-ld_classic

thanks @jeffhammond @ggouaillardet my specs are Sonoma 14.2.1 and [email protected] (Xcode 15.3)

neoh54 avatar Mar 28 '24 05:03 neoh54

Oddly enough this worked for me:

brew install gcc-13
../configure --prefix=/opt/extlib/openmpi/5.0.2/gcc/13.2.0 \
        --with-libevent=internal \
        --enable-mpi1-compatibility \
        --enable-static \
        --enable-pmix-timing \
        CC=gcc-13 CXX=g++-13 FC=gfortran-13
make clean
make -j 8
make check
sudo make install

I was unable to install open-mpi 5.0.3 with the same method.

waveman68 avatar Apr 26 '24 09:04 waveman68

Is Open MPI getting these flags from GNU Libtool? I.e., is this actually a Libtool issue?

No x 2:

https://github.com/open-mpi/ompi/blob/984944d9d9f3f6eda199fe6a040d65070d3a0745/config/ompi_setup_fc.m4#L236

I'm sorry for the huge delay here. Thanks for the citation of ompi_setup_fc.m4.

I have Sonoma 14.4.1 with XCode 15.3, and Homebrew gfortran

$ gfortran --version
GNU Fortran (Homebrew GCC 13.2.0) 13.2.0

But I don't see these warnings when I compile with the homebrew gfortran.

$ mpifort --showme  
gfortran -I/Users/jsquyres/bogus/include -Wl,-flat_namespace -Wl,-commons,use_dylibs -I/Users/jsquyres/bogus/lib -L/Users/jsquyres/bogus/lib -lmpi_usempif08 -lmpi_usempi_ignore_tkr -lmpi_mpifh -lmpi
$ mpifort hello_usempif08.f90 -o hello  
$

What is different between my setup and yours?

jsquyres avatar May 02 '24 15:05 jsquyres

But I don't see these warnings when I compile with the homebrew gfortran.

We have re-enabled the "classic linker" in Homebrew gfortran at some point.

fxcoudert avatar May 02 '24 16:05 fxcoudert

We have re-enabled the "classic linker" in Homebrew gfortran at some point.

Ah, gotcha. Is this issue moot, then? Or do we still need to investigate the use of -commons use_dylibs?

I see some comments in our code that these flags were necessary at some point, but I'm afraid I don't know/remember why they were necessary (i.e., to know if they are still necessary).

jsquyres avatar May 02 '24 19:05 jsquyres

They are still necessary. They are incompatible with Apple's new linker, hence we (for now) rely on the old linker.

fxcoudert avatar May 02 '24 20:05 fxcoudert

Ok. Given that homebrew gfortran has updated, should we close this issue?

jsquyres avatar May 02 '24 20:05 jsquyres

Well, it's a workaround, not a proper fix: at some point the "classic linker" might not be supported by Apple anymore. Maybe an alternative implementation is possible?

fxcoudert avatar May 02 '24 21:05 fxcoudert

Let me make sure I'm parsing your reply correctly:

  • You saying that -common use_dylibs is still necessary.
  • As such, Open MPI still needs to use these flags, and gfortran still needs to support them.
  • However, this solution uses Apple's old/classic linker, which could disappear someday. Hence, a difference solution should be found.

Is that correct?

If so, can you explain / remind me why we need -common use_dylibs / what those flags do?

jsquyres avatar May 02 '24 21:05 jsquyres

All of that is correct.

If so, can you explain / remind me why we need -common use_dylibs / what those flags do?

The Fortran part of open-mpi uses them for common blocks. I haven't dug more on how and why.

fxcoudert avatar May 02 '24 21:05 fxcoudert

Ah, yes, we do use some common blocks for sentinel values (i.e., they really have to be global so that we can look for them by address, not by value):

https://github.com/open-mpi/ompi/blob/ce3742c97821ee30ff5cbefda192f3c3754eb353/ompi/include/mpif-sentinels.h#L60-L68

jsquyres avatar May 02 '24 22:05 jsquyres

Also, I have tested and even if MPI wasn't using COMMON, the same linker behavior is required for Fortran module data to work properly, so one cannot argue that Apple is trying to force Fortran developers to stop using COMMON (which might be laudable in some contexts).

jeffhammond avatar May 03 '24 17:05 jeffhammond

@fxcoudert MPI implementations have to use COMMON for these. It's necessary because of how the MPI standard defines mpif.h and is furthermore required in the MPI Fortran modules until mpif.h is deleted, because sentinels are required to be interoperable across all MPI Fortran header/module usage.

It might be possible to define MPI_ANY_SOURCE (e.g.) as module data in the MPI modules, but then sentinel detection is two branches instead of one. I have not studied this in every detail to know if it's strictly valid or not, because there are a lot of edge cases to think about (such as Fortran code that uses the COMMON sentinel passing that argument into Fortran code that uses the module interfaces).

jeffhammond avatar May 03 '24 17:05 jeffhammond

Hello

I am running MacOS 14.5 on Apple M1, Xcode 15.4, gcc-14, g++-14 and fortran-14

I compiled open-mpi-5.0.3 :

configure --prefix=$APP_DIR/openmpi-5.0.3 FC=gfortran-14 CC=gcc-14 CXX=g++-14 -with-pmix=internal  --with-libevent=internal --with-hwloc=internal
make
make install

Then when I compile a program I face similar problem:

ld: warning: -commons use_dylibs is no longer supported, using error treatment instead
ld: common symbol '_mpi_fortran_argv_null_' from '/Users/chris/Builds/gnu14/paradigm/test/CMakeFiles/pdm_t_closest_points_f.dir/pdm_t_closest_points_f.f90.o' conflicts with definition from dylib '_mpi_fortran_argv_null_' from '/Users/chris/Applications/gnu14/openmpi-5.0.0/lib/libmpi_usempif08.40.dylib'
collect2: error: ld returned 1 exit status

tof92130 avatar May 15 '24 12:05 tof92130

For anyone else running into this, I found the following ways to all work around this:

  • Compile openmpi with --with-wrapper-fcflags=-Wl,-ld_classic
  • Edit openmpi-install-prefix/share/openmpi/mpifort-wrapper-data.txt and add to "-Wl,-ld_classic" to the linker_flags=-L${libdir} line
  • Set LDFLAGS="-Wl,-ld_classic"
  • If using spack to build a library that uses a broken mpifort: spack install my-package ldflags="-Wl,-ld_classic"

Chrismarsh avatar Jun 05 '24 15:06 Chrismarsh

The fix for this has been merged into main, v4.1.x, and v5.0.x. It will be included in the next releases of v4.1 and v5.0.

Thank you!

jsquyres avatar Jul 11 '24 16:07 jsquyres

Great news @jsquyres! Just to be clear, this will be in >=5.0.4? I.e., it isn't being backported?

Chrismarsh avatar Jul 11 '24 19:07 Chrismarsh

@Chrismarsh This will be available in 5.0.4

wenduwan avatar Jul 11 '24 19:07 wenduwan

It's going to be in v4.1.7, too. We keep promising to get v4.1.7 out "someday", but there hasn't been an urgent need yet.

It will not be in any v4.0.x release -- that series is dead.

jsquyres avatar Jul 11 '24 20:07 jsquyres