coll: replace MPIR_ERR_COLL_CHECKANDCONT
Pull Request Description
Replace MPIR_ERR_COLL_CHECKANDCONT with MPIR_ERR_CHECK. Propagating errors in collective does not work due to the complexity of collective algorithms. For example, the error condition is not guaranteed to be propagated to all processes. In addition, when there is a random hardware issue preventing the message to be delivered, trying to propagate error only hides the error and results in hang anyway.
[skip warnings]
Author Checklist
- [x] Provide Description Particularly focus on why, not what. Reference background, issues, test failures, xfail entries, etc.
- [x] Commits Follow Good Practice
Commits are self-contained and do not do two things at once.
Commit message is of the form:
module: short descriptionCommit message explains what's in the commit. - [ ] Passes All Tests Whitespace checker. Warnings test. Additional tests via comments.
- [x] Contribution Agreement For non-Argonne authors, check contribution agreement. If necessary, request an explicit comment from your companies PR approval manager.
test:mpich/ch3/most test:mpich/ch4/most
Is error propagation required by the MPI standard for collectives?
Is error propagation required by the MPI standard for collectives?
No.