ompi
ompi copied to clipboard
v5.0.x: Propagate the error up to the user.
Make sure to reset the generalized request to guarantee not to call the free callback a second time.
Signed-off-by: George Bosilca [email protected] (cherry picked from commit ac3647e16d2b9765fec2e60a61c92e9b8661efc4)
This is the v5.0.x PR corresponding to main PR #11683 Refs #11681.
FYI @dalcinl
@jsquyres I run this PR: https://github.com/mpi4py/mpi4py-testing/actions/runs/7588222142/job/20670268020 My test is failing, but that's not necessarily Open MPI's fault, but my assumptions. This is how things go:
- I call
Wait, then the error in the free_fn callback triggers, and MPI reports back the error. All good so far. - The call to
Waitdoes set the request toNULL, so it is not yet deallocated. - I call
Freeto fully delete the request. But this call fails withMPI_ERR_INTERN. Are you sure that's the proper behavior? Technically, there is no error this time, as the free_fn routine is not invoked again, and I believe there is no error this time to report, as the actual error was already reported atWait.
PS: For reference, this test is passing with MPICH, but that's simply because of their implementation choices.
@dalcinl @jsquyres what is the plan for this change? do we want to carry it in the release next week?
I'll perform a new test today with ompi@main, I'm not convinced the current behavior is the right one (not that my opinion matters, of course).
@dalcinl Did you get a chance to do a test, our goal was to get an rc out today.
@janjust Yes, I added my comments in #11681 with a request for the issue to be reopened. IMHO, things are still broken. If it were my call, I would not merge this PR.
Thanks - let me discuss with RMs, thank you very much for the reproducers!
Now that @hppritcha has added mpi4py for v5.0.x, I rebased this PR and will check CI result. I assume it's safe to merge if CI passes.
@wenduwan Please take into account #12392, that's IMHO the definitive fix, the changes there should be added here.