Add test for mpi_request_null to avoid error in wait[xxx]/test[xxx] when using sessions
Issue:
The MPI Standard explicitly allows for MPI_REQUEST_NULL in the list of requests passed to MPI_Wait[xxx]/MPI_Test[xxx].
However, using MPI_Wait[xxx]/MPI_Test[xxx] with a list containing active requests together with null requests leads to the following MPI error:
[n01:00000] *** An error occurred in MPI_Waitall
[n01:00000] *** reported by process [3991601153,3]
[n01:00000] *** on a NULL communicator
[n01:00000] *** Unknown error
[n01:00000] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[n01:00000] *** and MPI will try to terminate your MPI job as well)
--------------------------------------------------------------------------
Reproducer: The following code snippet is a minimal reproducer of the issue:
#include "mpi.h"
#include <stdio.h>
int main(int argc, char *argv[]){
MPI_Request requests[] = {MPI_REQUEST_NULL, MPI_REQUEST_NULL, MPI_REQUEST_NULL};
MPI_Session session;
MPI_Group wgroup;
MPI_Comm wcomm;
int wrank, wsize;
MPI_Session_init(MPI_INFO_NULL, MPI_ERRORS_ARE_FATAL, &session);
MPI_Group_from_session_pset(session, "mpi://WORLD", &wgroup);
MPI_Comm_create_from_group(wgroup, "test", MPI_INFO_NULL, MPI_ERRORS_RETURN, &wcomm);
MPI_Comm_size(wcomm, &wsize);
MPI_Comm_rank(wcomm, &wrank);
printf("Rank %d isend to rank %d and irecv from rank %d\n", wrank, (wrank + 1) % wsize, (wsize + wrank - 1) % wsize);
MPI_Isend(&wrank, 1, MPI_INT, (wrank + 1) % wsize, 42, wcomm, &requests[0]);
MPI_Irecv(&wrank, 1, MPI_INT, (wsize + wrank - 1) % wsize, 42, wcomm, &requests[1]);
MPI_Waitall(3, requests, MPI_STATUSES_IGNORE);
MPI_Session_finalize(&session);
return 0;
}
Root Cause:
The root cause of the issue is that ompi_comm_instances_same() fails, because MPI_REQUEST_NULL is not associated with a particular instance and therefore the instance is not the same for null requests and valid requests.
Proposed Solution:
This pull request adds a check for ompi_request_null in MPI_Wait[xxx]/MPI_Test[xxx] to avoid it being passed to the ompi_comm_instances_same() check.
I removed all changes that are not strictly addressing the issue. I will include these changes in another pull request to address code consistency.
@bosilca AFAIK, this is the request we hand out if the operation completes immediately, e.g., if it fits the eager protocol.
Makes sense, we gain peanuts on the small send but we pay the branching/testing cost for all multiple test/wait requests.
Let's put aside the discussion about performance, the code we have today seems incomplete with regard to sessions because we need to test for all OMPI predefined requests ompi_request_empty, ompi_request_empty_send and ompi_request_null
You're right that ompi_request_empty_send is missing from that list. Maybe these requests should be caught by the check for the req_mpi_object in the line below the new check. Currently, they are set to comm_world. I'm not sure why. My understanding is that the req_mpi_object is used for error handling but we won't hand out empty/null requests if an error occurred.
Having said that, these checks are not performance critical since they are only done if argument error checking is enabled.
that's our default. Literally everyone in the world is doing these checks.
A slightly different approach (#12691) was used to solve this.