ch4: lingering sent message interferes with later communicator
Issue https://github.com/pmodels/mpich/issues/6182 shows a case where a sent message from one communicator may interfere with traffic from a different communicator later. This is due to the communicator was freed after the send is completed and the later communicator can pick up the same context_id. Because the send message is still in the receiver's unexpected message queue, it matches later traffic.
https://github.com/pmodels/mpich/issues/6182#issuecomment-1253181646 provides a reproducer. We probably can simplify it into a much smaller reproducer.
I am not sure how to fix it.
One solution is to allocate a few bits in the match bits and set them to a sequence number of the communicator. But that simply mitigates the issue since the sequence number will wrap around and collide as well.
Alternatively, we can enhance the context_id algorithm so it to avoid picking up freed context_id right away. The current context_id allocation algorithm is unfortunately quite complex .
If we implement send-delivery completion, that certainly will fix the bug but it will be very bad for the performance especially for small latency sensitive messages.
I don't know nothing about anything but if the context_id is getting reused, can you somehow delay re-issuing the same context_id? I imagine in practice wrap-around is perfectly fine - we know of only one case (NEC5k about ten years ago) that made tons and tons of communicators.
It's tricky when matching is done below the MPI library, since the receiver doesn't know what unexpected messages might be in the queue. With software matching, we could add a reference the comm at the receive side and prevent the context_id from being reused.
Maybe we can do the equivalent to an MPI_ANY_TAG/MPI_ANY_SOURCE probe when freeing the communicator and keep it around if something is found in the queue? Debug builds would warn the user about the unreturned comm handle at finalize. Or we could do a truncated recv to remove the unmatched entry and issue some other warning.
Maybe we can do the equivalent to an MPI_ANY_TAG/MPI_ANY_SOURCE probe when freeing the communicator and keep it around if something is found in the queue? Debug builds would warn the user about the unreturned comm handle at finalize. Or we could do a truncated recv to remove the unmatched entry and issue some other warning.
That may not be a bad idea. I think a anysrc/anytag probe should work.
I'll hack up a patch and post it to try.