[MPICH] Build for device ch4
cc @giordano
Should we switch to ch4: https://github.com/JuliaParallel/MPI.jl/issues/720#issuecomment-1458606823?
I guess? I don't really know what it is, but if that is what is being maintained we should switch.
Uhm, ch4 is actually the default. We switched to ch3 in #3185, specifically in https://github.com/JuliaPackaging/Yggdrasil/commit/0104fa326a72bfc28c6376f665f4fe0f23471cec, but I can't really tell why :confused:
Lets try it and see!
Well, it turns out that compilation of ch4 with --enable-fast=all,O3 is horribly slow, some jobs are even timing out after 3 hours. In particular, compilation of src/mpi/coll/mpir_coll.c takes literally hours. The debug builds I did in https://github.com/JuliaParallel/MPI.jl/issues/720#issuecomment-1458832642 were pretty fast (~6 minutes) instead.
And compilation for many platforms actually fails, in many different ways, including choking on broken assembly, triggering ICEs, not finding header files and so on. This looks quite broken.
We copied --enable-fast=all,O3 from HomeBrew, but they also use --enable-g=dbg, which is also the option I used in https://github.com/JuliaParallel/MPI.jl/issues/720#issuecomment-1458832642, I wonder if that avoids compilation going completely haywire. We had some problems with debug builds in the past (see #4071 and #4039), but that specific problem should have been fixed by https://github.com/pmodels/mpich/pull/5720. Alternatively, we could try removing O3, and keep default optimisation O2, and see what happens in that case.
Do you want to try building with a different compiler? Maybe this resolves the long build times.
We're using three different compilers already, because of libgfortran (and for libgfortran 4 there isn't much choice, there's only gcc 7)
Ch4 is too finicky, compilation goes wrong in too many ways. I'm going to merge #6357 instead and we'll play with compiler flags another time.
Ok, as suggested in https://github.com/pmodels/mpich/issues/6434 --enable-fast=O3,ndebug --with-device=ch4 has a much more reasonable build time (we're still suffering from very slow downloads on CI, so the total duration of the jobs is often much longer than the build itself).
There are a couple of problems:
- it doesn't build on FreeBSD because
asm/types.his missing, we might have to use ch3 on this platform - the size of libmpi increases 10x, from 5 MB to 50 MB
@simonbyrne @eschnett what do you suggest to do? Go with ch4 (except on FreeBSD) and take the better performance, or stay with ch3 and keep a much smaller library?
I have no idea! I wonder what is the cause of the file size changes? Is it pulling in other libraries?
I'm leaning towards not doing this: MPICH_jll can remain a simple package, if users want something more performant they probably want to use system MPI anyway. @vchuravy opinions?
I am skewed towards providing a more performant default, so ch4 get's my vote
I would install MPICH in the same way Debian, Ubuntu, or Homebrew install it. That's would be a reasonable expectation to have. Alternatively we can switch to using OpenMPI by default instead.
Ubuntu's MPICH recipe contains these lines:
dh_auto_configure -- $(extra_flags) CPPFLAGS="" CFLAGS="" CXXFLAGS="" FFLAGS="$(FFLAGS)" FCFLAGS="$(FFLAGS)" BASH_SHELL=/bin/bash
( cd src/pm/hydra && ./configure --with-hwloc-prefix=/usr $(DEVICE) FFLAGS="$(FFLAGS)" --prefix=/usr )
so apparently there's a thing called hydra that is installed separately -- I don't know why. Apart from this Ubuntu doesn't use any special options and doesn't choose a device.
I would not provide a slow (what does this mean?) MPI implementation by default. I would also not start by expecting people to switch to a system MPI. If they have a laptop or a workstation, or even a small cluster, then Julia's default install should do something reasonable, good enough for interactive use and getting people hooked on Julia+MPI. In my mind this means that shared memory + Gigabit ethernet should work efficiently by default.
so apparently there's a thing called hydra that is installed separately -- I don't know why. Apart from this Ubuntu doesn't use any special options and doesn't choose a device.
hydra is the MPICH launcher.
#10249 switched to ch4