MPI.jl icon indicating copy to clipboard operation
MPI.jl copied to clipboard

Restore ability to use MPI implementations with unknown ABIs

Open sloede opened this issue 2 years ago • 10 comments

Based on comments by @vchuravy and as far as I understand from the code here, with the new MPIPreferences/"build-less" approach there is currently no way anymore to use MPI.jl with MPI backends that are not already known to MPI.jl. IMHO this is very unfortunate, since at least up to the current release of MPI.jl it was possible to use just any MPI implementation with MPI.jl and have the auto-detection mechanism figure out the ABI.

To use Julia with an unknown MPI implementations seems to only possible with via MPItrampoline at the moment. While MPItrampoline is a great tool and will certainly (hopefully) make things much smoother in the future, it is still comparably new and has not yet taken hold in most supercomputer centers. Therefore, HPC systems with non-compatible MPI ABIs (such as HPE's MPT, which is not compatible to any other MPI ABI) are precluded from using MPI.jl.

Since the current MPI.jl release still works technically flawless with unknown MPI (at least for our system with HPE's MPT), I strongly suggest for the time being we restore the ability to support other MPI ABI's than the big 3 + MPItrampoline. Ideally, one could have a (non-exported?) function to trigger the generation of an MPI constants file that one could either feed locally into own's one MPI.jl package (e.g. via the use of preferences) or that can be used as a basis to creating a PR to MPI.jl to add as a new officially supported ABI (where it would be appropriate). Otherwise it makes it much harder to support Julia with MPI on systems such as HLRS's Hawk, where the default MPI implementation is MPT and most available parallel tools such as HDF5 are provided for MPT.

cc @luraess

sloede avatar Apr 21 '22 06:04 sloede

So I see two solutions. The first is to provide a script that generates a const.jl and we put that into a Scratch.jl environment and use Preferences to load that file. We could do that during the use_system_binary to keep my ideal of not having a build step. I am bit wary of that since we will need to find a way to deal with multiple MPI versions since we want to change the layout of that file maybe.

Secondly we could use MPItrampoline's libmpiconstants and allow users to set the path to that using preferences. This would not use MPItrampoline for the call's but just as a way to lookup the constants for the unknown MPI Abi.

vchuravy avatar Apr 21 '22 06:04 vchuravy

Re/ the first option: Not sure if we need provide this with the described level of automation. I think the ability to generate the const.jl file (and put it in a given directory, defaulting to .) and then allowing to specify the file's location in the preferences (maybe even manually) could be enough. At the moment I feel like this is addressing a supercomputer-center specific use case, so imho it does not have to have the same polish as a "regular" user-facing feature.

Re/ the second option, I cannot really say since I do not know what following this path would entail.

sloede avatar Apr 22 '22 05:04 sloede

(I didn't see this discussion earlier, and we discussed on Discord.)

The earlier MPI.jl mechanism to figure out the ABI worked for about 95% of the ABI. The rest was hard-code for known implementations (MPICH or OpenMPI) or did not work. Support for Microsoft MPI was completely hard-coded.

The MPIconstants projects hosts a small C program that extracts the constants from any MPI implementation. We could build and run this at configure time.

This generates two files. One is a Julia file that defines the compile-time constants. The other is a C file that extracts the load-time constants at run time, and defines global variables that can be read from Julia. This file needs to be built as shared library and loaded from Julia.

Thus the process to make a new MPI implementation available to Julia is quite automated. If we replace the cmake build system by custom code, then we could easily run this at MPI.jl build time.

eschnett avatar Apr 26 '22 18:04 eschnett

This sounds like a potential solution. However, I strongly suggest that if we use this path, the ability to compile a C program to figure out the constants should be included in MPI.jl itself and not rely on downloading yet another repository.

sloede avatar Apr 27 '22 04:04 sloede

[...] then we could easily run this at MPI.jl build time.

Question from the sideline: Wasn't the idea to not have a build step anymore?

(Oh I see, Valentin already mentioned this above)

carstenbauer avatar Apr 27 '22 08:04 carstenbauer

For clarity: The build step would only be necessary if someone uses MPI.jl with an unknown MPI implementation, i.e. an MPI implementation that is different from MPICH, OpenMPI, Microsoft MPI, MPItrampoline. This build step would only be necessary for experts, and in practice, the MPI.jl maintainers would be there to offer help.

eschnett avatar Apr 27 '22 17:04 eschnett

As just discussed during the monthly Julia for HPC call (thanks also to @mkitti @giordano @williamfgc for the discussion):

Maybe it would be sufficient for now to restore the ability to use a custom ABI file (such as https://github.com/JuliaParallel/MPI.jl/blob/master/src/consts/mpt.jl or https://github.com/JuliaParallel/MPI.jl/blob/master/src/consts/mpich.jl) on a machine where the system MPI implementation is not compatible to any of the ABIs supported by MPI.jl. That is, we could add an additional keyword argument abi_file to MPIPreferences.use_system_binary https://github.com/JuliaParallel/MPI.jl/blob/112c72389047d1999a6535deed4d1b9eea3e0e33/lib/MPIPreferences/src/MPIPreferences.jl#L122-L128 which would default to nothing. If users want to support a custom MPI ABI, however, they could pass the path to a manually generated ABI file, in which case the abi keyword argument would be ignored. That way, users would be able to use MPI.jl as an installed package, and system administrators could provide this as a default on a compute cluster.

This approach would still require users to manually create the ABI file, but at least it allows customization without having to clone MPI.jl and just hack it in. We could also add a few sentences to the docs, explaining the basic steps you have to go through to create your own ABI constants file.

What do you think about this idea? It would be great to get some feedback from both the MPI.jl maintainers perspective (@simonbyrne @vchuravy) and other supercomputer operators (e.g. @omlins @carstenbauer).

sloede avatar May 24 '22 19:05 sloede

I'm not opposed to it, but I do wonder as to the utility: are there more ABIs in the wild?

simonbyrne avatar May 24 '22 19:05 simonbyrne

I'm not opposed to it, but I do wonder as to the utility: are there more ABIs in the wild?

We had this discussion yesterday as well. For the majority of systems, the answer is no: Most university and/or commodity clusters are likely to use one of the "big two", i.e., either MPICH (or something compatible) or OpenMPI. However, especially for leadership systems, vendors tend to provide their own MPI implementations, which may or may not be compatible to MPICH. Sometimes, the implementations are "mostly" compatible but have some peculiarities.

We had a longer discussion yesterday on how to proceed with this issue. It is likely that currently nobody has the motivation to recreate the auto-detection system used until v0.19, since everyone presently involved in MPI.jl does not have an issue with unknown ABIs anymore (including me). This most recent proposal is thus a compromise between developer effort and closing the doors for not-yet-officially-supported ABIs.

sloede avatar May 25 '22 04:05 sloede

I don't think it's necessary to re-introduce this ability. There are other issues where our time is better spent.

Apart from this, and for the record: The package MPIconstants does just this. It compiles two small C files that output the requested information, both the compile-time settings (e.g. how MPI handles are implemented) and run-time constants via a shared library (e.g. the value of MPI_STATUS_IGNORE etc.). I use this package to generate the settings for MPItrampoline. I can easily generate the settings for other MPI implementations as well.

eschnett avatar May 25 '22 16:05 eschnett