mpich
mpich copied to clipboard
comm: Check for unmatched messages before releasing context_id
Pull Request Description
If there are still messages using a communicator context_id in the unexpected queue at destruction time, they could disturb message matching if the context_id is reused. To prevent this, do not allow a communicator with unmatched messages to be freed, and therefore retain the context_id. See pmodels/mpich#6184.
Author Checklist
- [x] Provide Description Particularly focus on why, not what. Reference background, issues, test failures, xfail entries, etc.
- [x] Commits Follow Good Practice
Commits are self-contained and do not do two things at once.
Commit message is of the form:
module: short description
Commit message explains what's in the commit. - [ ] Passes All Tests Whitespace checker. Warnings test. Additional tests via comments.
- [x] Contribution Agreement For non-Argonne authors, check contribution agreement. If necessary, request an explicit comment from your companies PR approval manager.
test:mpich/ch4/most test:mpich/ch3/most
test:mpich/ch4/most test:mpich/ch3/most
test:mpich/ch4/most test:mpich/ch3/most
test:mpich/ch4/most test:mpich/ch3/most test:mpich/warnings
test:mpich/ch4/most test:mpich/ch3/most
test:mpich/ch4/most test:mpich/ch3/most
Something is up with Jenkins and/or Github. Jobs are failing when checking out the repo.
test:mpich/ch4/most test:mpich/ch3/most
test:mpich/ch4/most test:mpich/ch3/most test:mpich/warnings
Dropped the commit changing the default error handler to MPI_COMM_SELF
. It appears many of our tests rely on the default being MPI_COMM_WORLD
. Will split that change to a separate PR that also fixes the tests.
test:mpich/ch4/most test:mpich/ch3/most
test:mpich/ch4/most test:mpich/ch3/most
This change is detecting unreceived messages from unaccepted intercommunicator connection requests. Should that be possible with an MPI_ANY_TAG, MPI_ANY_SOURCE
probe/recv? I guess I assumed they would be isolated from regular pt2pt messages. In the Jenkins link, the messages are found on MPI_COMM_WORLD
: https://jenkins-pmrs.cels.anl.gov/job/mpich-review-ch4-ofi/3943/jenkins_configure=ubsan,label=centos64_review/testReport/junit/(root)/errors_spawn/02755_____errors_spawn_connect_timeout_no_accept_2__MPIR_CVAR_CH3_COMM_CONNECT_TIMEOUT_20_MPIR_CVAR_CH4_COMM_CONNECT_TIMEOUT_20/
test:mpich/ch4/most test:mpich/ch3/most
This change is detecting unreceived messages from unaccepted intercommunicator connection requests. Should that be possible with an
MPI_ANY_TAG, MPI_ANY_SOURCE
probe/recv? I guess I assumed they would be isolated from regular pt2pt messages. In the Jenkins link, the messages are found onMPI_COMM_WORLD
: https://jenkins-pmrs.cels.anl.gov/job/mpich-review-ch4-ofi/3943/jenkins_configure=ubsan,label=centos64_review/testReport/junit/(root)/errors_spawn/02755_____errors_spawn_connect_timeout_no_accept_2__MPIR_CVAR_CH3_COMM_CONNECT_TIMEOUT_20_MPIR_CVAR_CH4_COMM_CONNECT_TIMEOUT_20/
The issue is that the MPIDI_OFI_DYNPROC_SEND
bit is ignored when matching user messages. It needs to be included so preserve the isolation. Will add another patch to fix...
test:mpich/ch4/most test:mpich/ch3/most
test:mpich/ch4/most test:mpich/ch3/most
Tests look good. Warning test is unrelated and fixed in another PR, I believe. The lone tcp failure is a th_taskmaster
timeout on Solaris, which happens from time to time.
Pushed a minor whitespace fixup. This is ready for review.
test:mpich/ch4/most test:mpich/ch3/most
Sorry, forgot to squash my fixup before merging 🤦.