MPI.jl icon indicating copy to clipboard operation
MPI.jl copied to clipboard

Option to disable unsupported features -- eg: matched send/recv

Open JBlaschke opened this issue 2 years ago • 5 comments

Hey Folks,

I am testing a basic MPI Send/Recv example and noticed that the version with MPI.Send and MPI.Recv! works fine, while MPI.send / MPI.recv don't on NERSC Perlmutter. The error points to: https://github.com/JuliaParallel/MPI.jl/blob/bfaa7ca876867897e212523f4e842a66e9afa9f3/src/pointtopoint.jl#L170 -- and digging around further, it turns out that currently Mprobe is not not supported by Cray.

This bug will be fixed eventually -- but I anticipate that this won't be the first time that a vendor's MPI implementation won't fully support everything in the standard. So my question is: do we want to develop a process to disable things like matched send/recv?

PSA: If you get:

MPICH ERROR [Rank 1] [job id 13281867.0] [Fri Aug  4 13:47:24 2023] [nid008633] - Abort(808039439) (rank 1 in comm 0): Fatal error in PMPI_Mprobe: Other MPI error, error stack:
PMPI_Mprobe(118)........:  MPI_Mprobe(source=0, tag=5, comm=MPI_COMM_WORLD, message=0x7ffde7da199c, status=0x7ffde7da19a0)
PMPI_Mprobe(101)........:
MPID_Mprobe(199)........:
MPIDI_improbe_safe(146).:
MPIDI_improbe_unsafe(88):
(unknown)(): Other MPI error

then right now, switch to the upper-case functions -- eg. MPI.Recv!

JBlaschke avatar Aug 04 '23 21:08 JBlaschke

To elaborate: with "a process to disable things like matched send/recv?" I mean something like a setting that will disable selected non-crucial features. This setting will non-mandatory, and will not disable anything default, but can be used on systems with incomplete MPI support.

JBlaschke avatar Aug 04 '23 21:08 JBlaschke

You could add a preferences option to switch back to the old (slightly incorrect) behavior before #699? Seems like a bit of a slippery slope though.

simonbyrne avatar Aug 04 '23 21:08 simonbyrne

What made those slightly incorrect?

JBlaschke avatar Aug 04 '23 21:08 JBlaschke

I'd have to look at the spec again, but it certainly isn't correct in the presence of multi-threading (since there is no guarantee that the receive will match the probe request)

simonbyrne avatar Aug 04 '23 21:08 simonbyrne

Ah! I see

JBlaschke avatar Aug 04 '23 21:08 JBlaschke