meson icon indicating copy to clipboard operation
meson copied to clipboard

Improve MPI detection

Open paugier opened this issue 1 year ago • 11 comments

Fixes #7045, #9637 and #13615. See also the old PR #7373.

This PR improves support for MPI detection on Unix, which is currently quite broken. I didn't try to study what happens on Windows.

  • Currently, Meson only supports IntelMPI with Intel compilers and OpenMPI with other compilers. MPICH is not supported at all. (see #7045 and #13615)

  • Currently, pkg-config gets priority over detecting mpicc/mpiicc (and friends), which can lead to very unexpected results because only OpenMPI supports pkg-config and all implementations (at least on Unix) state that the correct way to detect/use MPI is to use mpicc/mpiicc (and friends). Then if the system pkg-config gives a positive result, Meson detects OpenMPI even if PATH has been correctly modified such that mpicc indicates something else.

  • Environment variables I_MPI_CC are wrongly considered whereas they are internal to IntelMPI (see #9637).

This PR improves the situation. The detection considers (in this order):

  • The standard environment variables MPICC (and friends)
  • For Intel compilers, the commands mpiicc (and friends)
  • The commands mpicc (and friends)
  • Finally, pkg-config.

OpenMPI, MPICH, IntelMPI and other compatible implementations should be supported with different compilers. In particular OpenMPI compiled with Intel compilers and IntelMPI compiled with GCC.

MPI can be a bit tricky and there are a lot of cases to consider. It would be interesting to get the point of view of people using MPI on different clusters. CC @RemiLacroix-IDRIS, @scivision, @rcoacci, @nordmoen, @acroucher.

Note : The output of mpicc -v is not standardized. I tried to support what I obtain from different implementations but there might be other cases. In particular, I removed a line with v = re.search(r'(\d{4}) Update (\d)', out).

paugier avatar Aug 30 '24 15:08 paugier

Note : The output of mpicc -v is not standardized. I tried to support what I obtain from different implementations but there might be other cases. In particular, I removed a line with v = re.search(r'(\d{4}) Update (\d)', out).

I think that's needed for older version of Intel MPI, for example here is the output I get:

$ mpiifort -v
mpiifort for the Intel(R) MPI Library 2019 Update 9 for Linux*
Copyright 2003-2020, Intel Corporation.
ifort version 19.1.3.304

RemiLacroix-IDRIS avatar Sep 02 '24 14:09 RemiLacroix-IDRIS

I think that's needed for older version of Intel MPI.

Thanks. This output should be supported now.

paugier avatar Sep 02 '24 15:09 paugier

Thanks for this! I just installed your branch in a venv and tried building a minimal Fortran program that links to MPI, on an Ubuntu 22.04 machine with mpich installed. Unfortunately it didn't seem to detect MPI - not sure how much testing you've done with Fortran? (It's also possible I'm doing something in an outdated way, as I've been stuck with using Meson 0.53 since mpich detection was broken.)

Here is the program and the meson build file:

program foo

  use mpi
  implicit none

  write(*,*) 'foo!'

end program foo
project('foo', 'fortran', version: '0.0.1')

mpi = dependency('mpi', language: 'fortran')
foo = executable('foo', ['foo.F90'], dependencies: [mpi])

and here is the relevant part of the meson-log.txt:

Found pkg-config: YES (/usr/bin/pkg-config) 0.29.2
Determining dependency 'ompi-fort' with pkg-config executable '/usr/bin/pkg-config'
env[PKG_CONFIG_PATH]: /home/acro018/lib/pkgconfig
env[PKG_CONFIG]: /usr/bin/pkg-config
-----------
Called: `/usr/bin/pkg-config --modversion ompi-fort` -> 1
stderr:
Package ompi-fort was not found in the pkg-config search path.
Perhaps you should add the directory containing `ompi-fort.pc'
to the PKG_CONFIG_PATH environment variable
No package 'ompi-fort' found
-----------
mpifort binary missing from cross or native file, or env var undefined.
Trying a default mpifort fallback at mpifort
Trying a default mpifort fallback at mpif90
Trying a default mpifort fallback at mpif77
mpifort found: NO
Run-time dependency MPI for fortran found: NO (tried pkgconfig, config-tool and system)

meson.build:3:6: ERROR: Dependency "mpi" not found, tried pkgconfig, config-tool and system

Here is the output from mpicc --version:

gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
Copyright (C) 2021 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

and mpifort --version:

GNU Fortran (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
Copyright (C) 2021 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

acroucher avatar Sep 02 '24 23:09 acroucher

Thanks a lot for the testing @acroucher. I didn't try Fortran so it's not too surprising that it fails. I should be able to try this and hopefully fix this issue.

paugier avatar Sep 03 '24 07:09 paugier

@acroucher can you recheck that you really use the branch paugier:mpi-detection ? From your log, I see that pkg-config is used first, which shouldn't be the case with my branch.

I tried your small example in a conda environment created with conda create -n env-mpich-fortran mpich fortran-compiler and MPI is correctly detected here.

mpifort found: YES (/data/mambaforge/envs/env-mpich-fortran/bin/mpifort) 4.2.2
Run-time dependency MPI for fortran found: YES 4.2.2

Can you please also provide the output of

mpifort -show
# and
mpifort -v  # (only first line)

paugier avatar Sep 03 '24 08:09 paugier

can you recheck that you really use the branch paugier:mpi-detection ?

Doh! you're right, I forgot to check out the branch. I ran it again on the correct branch and it works fine. Brilliant! I also tested it on some real code and that worked too. Thank you!

Can you please also provide the output of

And here are the outputs from mpifort -show and mpifort -v:

mpifort -show
gfortran -O2 -ffile-prefix-map=/build/mpich-0xgrG5/mpich-4.0=. -flto=auto -ffat-lto-objects -flto=auto -ffat-lto-objects -fstack-protector-strong -fallow-invalid-boz -fallow-argument-mismatch -Wl,-Bsymbolic-functions -flto=auto -ffat-lto-objects -flto=auto -Wl,-z,relro -I/usr/include/x86_64-linux-gnu/mpich -I/usr/include/x86_64-linux-gnu/mpich -L/usr/lib/x86_64-linux-gnu -lmpichfort -lmpich

mpifort -v
mpifort for MPICH version 4.0
Using built-in specs.
COLLECT_GCC=gfortran
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/11/lto-wrapper
OFFLOAD_TARGET_NAMES=nvptx-none:amdgcn-amdhsa
OFFLOAD_TARGET_DEFAULT=1
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu 11.4.0-1ubuntu1~22.04' --with-bugurl=file:///usr/share/doc/gcc-11/README.Bugs --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,m2 --prefix=/usr --with-gcc-major-version-only --program-suffix=-11 --program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --enable-bootstrap --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-vtable-verify --enable-plugin --enable-default-pie --with-system-zlib --enable-libphobos-checking=release --with-target-system-zlib=auto --enable-objc-gc=auto --enable-multiarch --disable-werror --enable-cet --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --enable-offload-targets=nvptx-none=/build/gcc-11-XeT9lY/gcc-11-11.4.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-11-XeT9lY/gcc-11-11.4.0/debian/tmp-gcn/usr --without-cuda-driver --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu --with-build-config=bootstrap-lto-lean --enable-link-serialization=2
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 11.4.0 (Ubuntu 11.4.0-1ubuntu1~22.04)

acroucher avatar Sep 03 '24 23:09 acroucher

I did the history edition suggested. I hope I did it correctly. I get:

@  changeset:   16979:3dc06bdb03d3
|  bookmark:    mpi-detection
|  tag:         default/mpi-detection
|  tag:         tip
|  user:        paugier <[email protected]>
|  date:        Mon Sep 02 17:46:08 2024 +0200
|  summary:     MPI detection: get version from old IntelMPI wrappers
|
o  changeset:   16978:025928ebca77
|  parent:      16975:5d46b2165d8f
|  user:        paugier <[email protected]>
|  date:        Fri Aug 30 17:29:10 2024 +0200
|  summary:     MPI detection: support more implementations (any compilers)
|
o  changeset:   16975:5d46b2165d8f
|  user:        paugier <[email protected]>
|  date:        Wed Sep 04 23:15:56 2024 +0200
|  summary:     MPI detection: mpicc/mpiicc before pkg-config
|
o  changeset:   16974:c34a0170dda1
|  bookmark:    master
|  tag:         upstream/master
|  user:        Dylan Baker <[email protected]>
|  date:        Mon Aug 19 14:29:44 2024 -0700
|  summary:     mformat: provide nice error message instead of backtrace for invalid value

which is quite reasonable (except maybe the dates). However, it seems that Github sorts the commits by date, which is a bit wrong here.

paugier avatar Sep 04 '24 21:09 paugier

which is quite reasonable (except maybe the dates). However, it seems that Github sorts the commits by date, which is a bit wrong here.

That looks like mercurial output, so perhaps it's about the export format. In git on the command-line they are ordered correctly, but I also notice that the mercurial "date" field is mapped to both "AuthorDate" and "CommitDate" in git. Usually when applying/reordering commits in git, the CommitDate is the date that you performed the reordering, and I am pretty sure that's what github uses to order its history flow since it also wants to interweave commits in between the comments surrounding when the commits were made.

Weird UX issue, I guess. :) Works locally though.

eli-schwartz avatar Sep 04 '24 23:09 eli-schwartz

Finding MPI is quite a task. For reference:

scivision avatar Sep 05 '24 00:09 scivision

Finding MPI is quite a task.

Indeed! It seems to me that it is reasonable to progress by steps. Especially because it is very difficult (or impossible) to test the different possibilities.

The first step (this PR) is to fix Meson for the very common cases (in particular MPICH, OpenMPI with Intel compiler and IntelMPI with GCC).

At least, when Meson gets a good support for the common cases, more projects relying on MPI will be able to use Meson and we can expect other bug reports that will allow us to support more cases.

A first next step after this PR would be to look at what happens on Windows.

paugier avatar Sep 05 '24 07:09 paugier

@eli-schwartz, is there something else I need to do on this PR? It seems to me that it really improves the situation.

paugier avatar Sep 10 '24 08:09 paugier

@eli-schwartz gentle ping. Can I get another feedback on this PR? Is there an issue? Should I do something more?

paugier avatar Sep 17 '24 13:09 paugier

Other than the loop thing, this looks super reasonable to me.

dcbaker avatar Sep 17 '24 20:09 dcbaker

I updated the PR. @eli-schwartz and @dcbaker should I have to do something more?

paugier avatar Sep 20 '24 11:09 paugier

Fixes #7045, #9637 and #13615. See also the old PR #7373.

Aside: github unfortunately doesn't actually figure this information out on its own. It requires you to repeat the "fixes" word for each ticket if you want merging this PR to automatically close the relevant issues.

eli-schwartz avatar Sep 23 '24 23:09 eli-schwartz

I changed the description of this PR to automatically close the relevant issues.

paugier avatar Sep 24 '24 12:09 paugier

I rebased to fix a lint issue (https://github.com/mesonbuild/meson/pull/13695).

paugier avatar Sep 24 '24 12:09 paugier

Thank you for persevering. :)

eli-schwartz avatar Sep 24 '24 15:09 eli-schwartz

Thanks @eli-schwartz and @dcbaker for your reviews.

Now I like to use this on conda-forge, Spack and Guix but I will have to wait for a new Meson release.

I see that Meson 1.5.2 has just been released few days ago (https://pypi.org/project/meson/#history). Do you think we could have a new release (1.5.3 ?) in few days or weeks?

I realize that pushing a new version on PyPI for Meson might not be as straightforward as for simpler and less popular projects. So I'm just respectfully asking...

paugier avatar Sep 25 '24 11:09 paugier

Unfortunately I don't think we can backport it at all. It changes the "shape" of the pickled coredata which means that existing build directories configured with meson 1.5.2 would be binary incompatible with meson 1.5.3, leading to crashes. A new major.minor release forces a full from-scratch reconfigure instead and doesn't load e.g. things like cached dependency lookups.

It's a good question though, in general, because I'm always happy to consider nominating a patch for backporting, and I'd like to get us into the situation where we issue new bugfix releases once every couple of weeks in the event that there are patches that have been nominated

eli-schwartz avatar Sep 25 '24 11:09 eli-schwartz

Oh that's bad news. If I understand correctly, it means that these fixes will be only included in Meson 1.6.0, which could be available in something like few months if I estimate from the release history.

Without a release on PyPI containing these fixes, it seems to me that it's impossible to test with conda-forge and Spack.

In practice, it means that Python projects using Meson and MPI cannot be installed on clusters using MPICH with standard install procedures (one would need to install by hands meson from source + all build dependencies and to run pip install --no-build-isolation ...).

The next question is then do you have an idea when Meson 1.6.0 could be released? If it is a matter of approximately one month I would wait. However, if it could be few months, I would have to switch back few projects to use another backend (I guess setuptools). Note that it would not be a huge issue, just a bit of not very useful and interesting work.

paugier avatar Sep 25 '24 14:09 paugier

Considering our usual release cadence, meson 1.6.0rc1 should probably be released within the month.

It's also possible to test with conda-forge and spack, if you make a VCS package available for meson. For example, in Gentoo these are called "live" ebuilds, you install "meson==9999" and it always fetches the latest code from git master. In Arch, the same thing is called "meson-git==1.5.0.r214.g1aac6cc1e"

You can also specify a dependency in pyproject.toml as:

  • meson @ git+https://github.com/mesonbuild/meson
  • meson @ git+https://github.com/mesonbuild/meson@sha1

if you want to guarantee that pip install and build isolation pulls in a version of meson that you know has all the features you want. pip install will anyways not preserve build directories by default, so no incremental builds of an existing worktree and therefore no worry about binary coredata compatibility.

But we really should be having a new release (candidate) within the month. The previous release was July 10, it's been 2.5 months since then, and we try to have a new release once every 3 or 4 months, which means we should have one anywhere from 2 weeks to 1.5 months from now -- and if we assume the outer limit of 1.5 months, we still need to put out release candidates at least 3 weeks ahead of time.

eli-schwartz avatar Sep 25 '24 15:09 eli-schwartz

@paugier,

the plan is to tag the first release candidate next Sunday.

eli-schwartz avatar Sep 29 '24 17:09 eli-schwartz

@paugier,

There is an rc1 released today and available for installation via PyPI. I packaged it in Gentoo for the benefit of people doing prerelease testing, but I don't know the policies of conda or spack around that sort of thing.

Please test this prerelease if you can, to help ensure the release occurs as smoothly as possible.

We hope to release the final 1.6.0 release in one week's time. Alas, life finds a way, and on average it's common to find at least one regression serious enough to make a second release candidate to allow people to test that fix; if this should happen, we expect the final release to happen in two week's time instead.

eli-schwartz avatar Oct 06 '24 23:10 eli-schwartz

@eli-schwartz

I'm finally trying 1.6.0.rc1 and ~~it seems that there is a problem~~. I'm trying to understand.

paugier avatar Oct 11 '24 13:10 paugier

It was my mistake. Just for the record, a "simple" way to check:

conda create -n env-mpich mpich cxx-compiler python=3.12 -y
conda activate env-mpich
conda install mpi4py fftw=*=mpi* fluidfft pkg-config meson-python cython fluidfft-builder -y
pip install meson --pre -U
pip install fluidfft-fftwmpi --no-build-isolation -v

which gives:

  mpic++ found: YES (/home/pierre/mambaforge/envs/env-mpich/bin/mpic++) 4.2.3

mpich can be replaced by openmpi or impi-devel.

paugier avatar Oct 11 '24 15:10 paugier