ompi
ompi copied to clipboard
Sessions: error checking for invalid argument
While extending mpi4py test suite, I've found several issues related to error checking and invalid arguments.
Issues related to using the MPI_SESSION_NULL handle
The following routines do not fail when invoked with the MPI_SESSION_NULL handler:
MPI_Session_get_num_psets(deadlocks)MPI_Session_get_nth_psetMPI_Group_from_session_psetMPI_Session_get_errhandlerMPI_Session_set_errhandler
Issues related to other invalid arguments
MPI_Session_get_nth_pset: Using indexnpositive but out-of-bounds (larger or equal to the number of processor sets) deadlocks rather than returning with error code (I would suggest using MPI_ERR_ARG error class).MPI_Session_get_pset_info: Requesting the info object from a non existent pset name (e.g. something like the pset_name string"@qerty!#$") does not fail. Instead, it succeeds and returns an info object with a single key/value pair("size", "0").MPI_Group_from_session_pset: Trying to create a group from a a non existent pset name (e.g. something like the pset_name string"@qerty!#$") errors with MPI_ERR_INTERN. It would be much informative to users if MPI_ERR_ARG where used instead.
Refs #10589
the MPI_Session_get_pset_info call with invalid pset name may be a pmix issue. checking...
@hppritcha I'm seeing just one issue left, although perhaps I did not reported it before. Sorry about that, there were many loose ends, and I just missed the following one.
MPI_Session_get_info()withMPI_SESSION_NULLfails with errorMPI_ERR_ARG, but should fail withMPI_ERR_SESSION.
@hppritcha Another recent issue [logs], it happened on GitHub actions with 3 MPI processes (but not 1 or 2). I could not reproduce with in main (updated a few hours ago). I'll try to restart the build.
The failing test is related to the following reproducer:
from mpi4py import MPI
session = MPI.Session.Init()
num = session.Get_num_psets()
try:
pset = session.Get_nth_pset(num)
except MPI.Exception:
pass
else:
print("Exception not raised!")
MPI.COMM_WORLD.Abort()
session.Finalize()
As you can see in the reproducer, trying to get the an out-of-bound pset index should fail with exception. But that's not happening in GitHub Actions when using 3 MPI processes.
Not sure whether this is relevant, but GitHub actions runners have 2 virtual cores, so I'm running with oversubscription turned on.
@hppritcha Another maybe related issue: when running in singleton init mode, the reproducer above deadlocks at the ~pset = session.Get_nth_pset(num)~ num = session.Get_num_psets() line.
@hppritcha Now I'm not sure this new issue is related to this one. Do you want me to open a new one?
its a different problem so please open a different issue.
closed via #10744 and #10784