ompi icon indicating copy to clipboard operation
ompi copied to clipboard

Implementation of the current state of MPI Continuations proposal [WIP]

Open devreal opened this issue 4 years ago • 21 comments

This PR adds an implementation of the MPI Continuations proposal as extension to Open MPI. This proposal is current under discussion in the MPI hybrid and accelerator working group. By integrating it as an extension into Open MPI, I am hoping to provide an up-to-date implementation for the community to experiment with. It reflects the current state of the proposal and will be kept in sync with the evolving proposal.

The implementation is mostly confined to the extension itself, with the exception of hooks in request test and wait functions used to allow polling on a continuation request to complete outstanding continuations. This is needed for the implementation of the "mpi_continue_poll_only" info key (see the description below).

Note: this PR is currently WIP as it relies on a workaround to define OMPI_HAVE_MPI_EXT_CONTINUE, which is used to disable the hooks in request test/wait functions if the extension is not enabled. The underlying issue is that the extensions integration currently does not support including the mpiext.h header in implementation files (needed to disable the hooks mentioned above).

Overview of MPI Continuations

Continuations provide a mechanism for attaching callbacks to outstanding operation requests. A call to MPIX_Continue takes a request, a function pointer, a user-provided data pointer, and a status object (or MPI_STATUS_IGNORE), along with a continuation request and attaches the continuation to the operation request:

MPI_Request req;
// status object has to remain valid until the callback is invoked
MPI_Status *status = malloc(sizeof(MPI_Status));
char *buf = ...;
MPI_Irecv(buf, ..., MPI_ANY_SOURCE, ... &req);
MPIX_Continue(&req, &complete_cb, buf, status, cont_req);
assert(req == MPI_REQUEST_NULL);

The ownership of non-persistent requests is returned to MPI and the pointer to the request will be set to MPI_REQUEST_NULL. The callback is passed the status pointer and the user-provided data pointer:

void complete_cb(MPI_Status *status, void *user_data) {
  printf("Send completed\n");
  char *buf = (char*)user_data;
  process_msg(buf, status->MPI_SOURCE);
  free(buf); // free the send buffer
  free(status);    // free the status
}

The status has to remain valid until the invocation of the callback and is set according to the operation before the callback is invoked.

The continuation is registered with the provided continuation request. The continuation request is a request allocated using MPIX_Continue_init:

MPIX_Continue_init(info, &cont_req);

Continuation requests may be used to test/wait for completion of all continuations registered with it using MPI_Test/Wait. Supported info keys are:

  • "mpi_continue_poll_only": if true, only execute continuations when MPI_Test/Wait is called on the continuation request. If false, continuations may be executed at any time inside a call into MPI (inside a callback registered with opal_progress in the implementation; default: false).
  • "mpi_continue_enqueue_complete": if false, the continuation is executed immediately if the operations are already complete when MPIX_Continue is called. Execution is deferred otherwise (default: false)
  • "mpi_continue_max_poll": the maximum number of continuations to execute when calling MPI_Test on the continuation request (default: -1, meaning unlimited)

A continuation may in turn be attached to a continuation request, in which case it will be executed once all continuations registered with the continuation request have completed.

In addition to MPIX_Continue, the proposal also includes MPIX_Continueall which attaches a continuation to a set of requests such that the continuation is executed once all operations have completed.

Signed-off-by: Joseph Schuchart [email protected] Signed-off-by: George Bosilca [email protected]

devreal avatar Oct 09 '21 22:10 devreal

I pushed a bunch of changes. The biggest change is that a continuation request is now derived from ompi_request_t, which shrunk ompi_request_cont_data_t considerably and cuts down on the pointer chasing. The latter is now only used to link operation requests to the continuation object and the user-provided status object.

devreal avatar Oct 18 '21 23:10 devreal

Oh, and I will squash everything down to one commit once we're done...

devreal avatar Oct 18 '21 23:10 devreal

@bosilca I pushed a few changes:

  1. Now using the new OMPI_MPIEXT_*_POST_CONFIG hook. I hope this is how it is intended to be used.
  2. I had to make sure that the state of continuation requests is not changed to inactive in test/wait. They are always active and never explicitly (re)started by the user.
  3. Continuations that are poll-only (not executed as part of global progress) will now only be executed by the thread testing/waiting for their completion. We don't merge the local list into the global list anymore but instead allow the waiting thread to execute them if not blocked on the sync. Otherwise they will be executed at the end of the wait, once all continuations are eligible for execution. I think this change makes sense as this avoids other threads from being disturbed by poll-only continuations.

devreal avatar Nov 09 '21 15:11 devreal

Hello! The Git Commit Checker CI bot found a few problems with this PR:

d80d3caa: Remove re-iteration of continuation requests in te...

  • check_signed_off: does not contain a valid Signed-off-by line

Please fix these problems and, if necessary, force-push new commits back up to the PR branch. Thanks!

github-actions[bot] avatar Nov 03 '22 17:11 github-actions[bot]

Hello! The Git Commit Checker CI bot found a few problems with this PR:

f3babb54: Remove re-iteration of continuation requests in te...

  • check_signed_off: does not contain a valid Signed-off-by line

Please fix these problems and, if necessary, force-push new commits back up to the PR branch. Thanks!

github-actions[bot] avatar Nov 22 '22 15:11 github-actions[bot]

Hello! The Git Commit Checker CI bot found a few problems with this PR:

f3babb54: Remove re-iteration of continuation requests in te...

  • check_signed_off: does not contain a valid Signed-off-by line

Please fix these problems and, if necessary, force-push new commits back up to the PR branch. Thanks!

github-actions[bot] avatar Nov 29 '22 20:11 github-actions[bot]

Hello! The Git Commit Checker CI bot found a few problems with this PR:

f3babb54: Remove re-iteration of continuation requests in te...

  • check_signed_off: does not contain a valid Signed-off-by line

Please fix these problems and, if necessary, force-push new commits back up to the PR branch. Thanks!

github-actions[bot] avatar Dec 12 '22 13:12 github-actions[bot]

Hello! The Git Commit Checker CI bot found a few problems with this PR:

f3babb54: Remove re-iteration of continuation requests in te...

  • check_signed_off: does not contain a valid Signed-off-by line

Please fix these problems and, if necessary, force-push new commits back up to the PR branch. Thanks!

github-actions[bot] avatar Dec 16 '22 10:12 github-actions[bot]

Hello! The Git Commit Checker CI bot found a few problems with this PR:

f3babb54: Remove re-iteration of continuation requests in te...

  • check_signed_off: does not contain a valid Signed-off-by line

Please fix these problems and, if necessary, force-push new commits back up to the PR branch. Thanks!

github-actions[bot] avatar Dec 19 '22 15:12 github-actions[bot]

Hello! The Git Commit Checker CI bot found a few problems with this PR:

f3babb54: Remove re-iteration of continuation requests in te...

  • check_signed_off: does not contain a valid Signed-off-by line

Please fix these problems and, if necessary, force-push new commits back up to the PR branch. Thanks!

github-actions[bot] avatar Jan 31 '23 00:01 github-actions[bot]

Hello! The Git Commit Checker CI bot found a few problems with this PR:

f3babb54: Remove re-iteration of continuation requests in te...

  • check_signed_off: does not contain a valid Signed-off-by line

Please fix these problems and, if necessary, force-push new commits back up to the PR branch. Thanks!

github-actions[bot] avatar Mar 02 '23 21:03 github-actions[bot]

Hello! The Git Commit Checker CI bot found a few problems with this PR:

f3babb54: Remove re-iteration of continuation requests in te...

  • check_signed_off: does not contain a valid Signed-off-by line

Please fix these problems and, if necessary, force-push new commits back up to the PR branch. Thanks!

github-actions[bot] avatar Aug 21 '23 17:08 github-actions[bot]

Hello! The Git Commit Checker CI bot found a few problems with this PR:

f3babb54: Remove re-iteration of continuation requests in te...

  • check_signed_off: does not contain a valid Signed-off-by line

Please fix these problems and, if necessary, force-push new commits back up to the PR branch. Thanks!

github-actions[bot] avatar Sep 07 '23 17:09 github-actions[bot]

Hello! The Git Commit Checker CI bot found a few problems with this PR:

f3babb54: Remove re-iteration of continuation requests in te...

  • check_signed_off: does not contain a valid Signed-off-by line

Please fix these problems and, if necessary, force-push new commits back up to the PR branch. Thanks!

github-actions[bot] avatar Feb 06 '24 18:02 github-actions[bot]

Hello! The Git Commit Checker CI bot found a few problems with this PR:

a32e7e3c: Remove re-iteration of continuation requests in te...

  • check_signed_off: does not contain a valid Signed-off-by line

Please fix these problems and, if necessary, force-push new commits back up to the PR branch. Thanks!

github-actions[bot] avatar Apr 02 '24 19:04 github-actions[bot]

Hello! The Git Commit Checker CI bot found a few problems with this PR:

a32e7e3c: Remove re-iteration of continuation requests in te...

  • check_signed_off: does not contain a valid Signed-off-by line

Please fix these problems and, if necessary, force-push new commits back up to the PR branch. Thanks!

github-actions[bot] avatar Apr 02 '24 20:04 github-actions[bot]

Hello! The Git Commit Checker CI bot found a few problems with this PR:

a32e7e3c: Remove re-iteration of continuation requests in te...

  • check_signed_off: does not contain a valid Signed-off-by line

Please fix these problems and, if necessary, force-push new commits back up to the PR branch. Thanks!

github-actions[bot] avatar Apr 02 '24 20:04 github-actions[bot]