ompi icon indicating copy to clipboard operation
ompi copied to clipboard

Error propagation in free_fn callback for generalized requests

Open dalcinl opened this issue 2 years ago • 4 comments
trafficstars

This is with branch v5.0.x, but I guess main has the same issue (currently, I'm unable to build main with internal pmix on Fedora 38, I'll submit another issue).

Reproducer

#include <stddef.h>
#include <mpi.h>

static int query_fn  (void *ctx, MPI_Status *s) { return MPI_SUCCESS; }
static int free_fn   (void *ctx) { return MPI_ERR_OTHER; }  // <-- RETURN WITH FAILURE !!!
static int cancel_fn (void *ctx, int c) { return MPI_SUCCESS;   }

int main(int argc, char *argv[])
{
  MPI_Request request;
  MPI_Init(&argc, &argv);
  MPI_Grequest_start(query_fn, free_fn, cancel_fn, NULL, &request);

  MPI_Grequest_complete(request);
  MPI_Wait(&request, MPI_STATUS_IGNORE);

  MPI_Finalize();
  return 0;
}

Expected behavior

The reproducer should abort (via default error handler).

Actual behavior

The reproducer runs to completion with success and no output. The error code from free_fn is not propagated to MPI_Wait as required by the MPI standard.

dalcinl avatar May 16 '23 12:05 dalcinl

@jsquyres Could you please reopen this issue? Thins still do not work quite as I would expect.

Here you have another reproducer:

#include <stdio.h>
#include <mpi.h>

static int query_fn  (void *ctx, MPI_Status *s) { return MPI_SUCCESS; }
static int free_fn   (void *ctx) { return MPI_ERR_OTHER; }  // <-- RETURN WITH FAILURE !!!
static int cancel_fn (void *ctx, int c) { return MPI_SUCCESS;   }

int main(int argc, char *argv[])
{
  int ierr;
  MPI_Request request;
  MPI_Init(&argc, &argv);

  MPI_Comm_set_errhandler(MPI_COMM_WORLD, MPI_ERRORS_RETURN);
  MPI_Grequest_start(query_fn, free_fn, cancel_fn, NULL, &request);
  MPI_Grequest_complete(request);

  {
    ierr = MPI_Wait(&request, MPI_STATUS_IGNORE);
    printf("After Wait() - error: %d, active: %d\n", ierr, request!=MPI_REQUEST_NULL);
  }
  
  if (MPI_REQUEST_NULL != request) {
    ierr = MPI_Request_free(&request);
    printf("After Free() - error: %d, active: %d\n", ierr, request!=MPI_REQUEST_NULL);
  }
  
  MPI_Finalize();
  return 0;
}

I'm getting the following output:

$ mpicc test.c
$ ./a.out 
After Wait() - error: 16, active: 1
After Free() - error: 17, active: 1

As you can see, I'm still getting an error after MPI_Request_free. Additionally, the request variable is not set to MPI_REQUEST_NULL afterwards. This kind of behavior make error handling and handle lifetime management cumbersome. IMHO, If the call to Wait() fails and does not fully deallocate the request object and set it to MPI_REQUEST_NULL on return (maybe this is the ideal behavior), then the call to Free() should not fail at all (the error has already been reported at Wait()) and of course the request should be fully deallocated and set to MPI_REQUEST_NULL.

dalcinl avatar Jan 23 '24 10:01 dalcinl

The grequest provided free function returned an error, why would you expect the request to be MPI_REQUEST_NULL ? For what we know the request was not freed.

bosilca avatar Jan 23 '24 18:01 bosilca

The grequest provided free function returned an error, why would you expect the request to be MPI_REQUEST_NULL ? For what we know the request was not freed.

The free function is a user thing used to release user resources. If the free function somehow fails to release these user resources, there is nothing else to do with the user stuff. However, MPI could happily continue and release internal MPI stuff, and then, at the end, return an error code to signal the user free failure.

If you do not like/want the above behavior, then at least don't make MPI_Request_free() fail again. Why would MPI_Request_free() fail again? a) The user free function is not being called again b) the error was already reported and hopefully handled at the Wait() or Test() call. Even worse, after the MPI_Request_free(), the request handle is not set to MPI_REQUEST_NULL, making handle lifetime management hard.

Please look at my reproducer above. Ideally, I would argue that the example should print a single line with

After Wait() - error: 16, active: 0

That is, the the wait call deallocates everything MPI-internal, then completes with error and sets request=MPI_REQUEST_NULL on return.

Again, if that is not possible, what I'm asking for is to at least make the example print

After Wait() - error: 16, active: 1
After Free() - error: 0, active: 0

that is, MPI_Request_free() deallocates everything MPI-internal, then completes successfully and sets request=MPI_REQUEST_NULL on return.

dalcinl avatar Jan 23 '24 19:01 dalcinl

FWIW, the MPICH behavior is for MPI_Wait to set request=MPI_REQUEST_NULL and fail returning error. This is my preferred implementation from my previous comment. If you decide to do things differently, it would be just another instance of MPI implementation exhibiting different behavior.

EDIT: Same thing for MPI_Test.

dalcinl avatar Jan 23 '24 19:01 dalcinl