ompi icon indicating copy to clipboard operation
ompi copied to clipboard

MPI_Waitall doesn't return MPI_ERR_IN_STATUS for persistent request

Open mentOS31 opened this issue 2 months ago • 2 comments

Background information

What version of Open MPI are you using? (e.g., v4.1.6, v5.0.1, git branch name and hash, etc.)

main

Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)

git clone

If you are building/installing from a git clone, please copy-n-paste the output from git submodule status.

 08e41ed5629b51832f5708181af6d89218c7a74e 3rd-party/openpmix (v1.1.3-4067-g08e41ed5)
 30cadc6746ebddd69ea42ca78b964398f782e4e3 3rd-party/prrte (psrvr-v2.0.0rc1-4839-g30cadc6746)
 6032f68dd9636b48977f59e986acc01a746593a6 3rd-party/pympistandard (remotes/origin/main-23-g6032f68)
 dfff67569fb72dbf8d73a1dcf74d091dad93f71b config/oac (dfff675)

Please describe the system on which you are running

  • Operating system/version: rockylinux:10.0
  • Computer hardware: Ampere Altra MAX or NVIDIA Grace
  • Network type: intra-node

Details of the problem

Problem 1

MPI_Waitall() and MPI_Testall() don't return MPI_ERR_IN_STATUS under the following conditions:

(1) either an MPI_Testall() or an MPI_Waitall() completing procedure is called,
(2) the communicator with errhandler MPI_ERRORS_RETURN is specified as the comm argument to the completing procedure,
(3) the request handle associated with a persistent communication request is specified in the array_of_requests argument to the completing procedure, and
(4) the persistent request completes with an error ( .req_status.MPI_ERROR != MPI_SUCCESS).

Table 1: Multiple completion functions that return multiple statuses

array_of_statuses == MPI_STATUSES_IGNORE array_of_statuses != MPI_STATUSES_IGNORE
MPI_Waitall valid invalid
MPI_Waitsome valid valid
MPI_Testall invalid invalid
MPI_Testsome valid valid

Invalid means in this condition the completing procedure doesn't return MPI_ERR_IN_STATUS.

Reason

For example, it seems that ompi_request_default_wait_all() in ompi/request/req_wait.c calls the continue statement before setting MPI_ERR_IN_STATUS to the mpi_error return code variable if the request is a persistent communication request.

code

            if( request->req_persistent ) {
                request->req_state = OMPI_REQUEST_INACTIVE;
                continue;
            }

Problem 2

MPI_Testany() doesn't return the communication error under the following conditions:

(1) An MPI_Testany() completing procedure is called,
(2) The communicator with errhandler MPI_ERRORS_RETURN is specified as the comm argument to the completing procedure,
(3) The request handle associated with a persistent communication request is specified in the array_of_requests argument to the completing procedure, and
(4) the persistent request completes with an error ( .req_status.MPI_ERROR != MPI_SUCCESS).

Reason

It seems that ompi_request_default_test_any() in ompi/request/req_test.c returns MPI_SUCCESS unconditionally if the request is a persistent communication request.

code

            if( request->req_persistent ) {
                request->req_state = OMPI_REQUEST_INACTIVE;
                return OMPI_SUCCESS;
            }

Problem 3

MPI completing procedures such as MPI_Wait() and MPI_Test() may free the request handle for a persistent communication request under the following conditions:

(1) Either of completing procedures is called,
(2) The request handle associated with a persistent communication request is specified in the array_of_requests argument to the completing procedure, and
(3) the persistent request completes with an error ( .req_status.MPI_ERROR != MPI_SUCCESS).

Reason

In case of .req_status.MPI_ERROR != MPI_SUCCESS, it seems that the ompi_errhandler_request_invoke() in ompi/errhandler/errhandler_invoke.c unconditionally frees the request by calling ompi_request_free() except a FT condition even if the request is a persistent request.

code

        if (MPI_REQUEST_NULL != requests[i] &&
            MPI_SUCCESS != requests[i]->req_status.MPI_ERROR) {
#if OPAL_ENABLE_FT_MPI
            /* Special case for MPI_ANY_SOURCE when marked as
             * MPI_ERR_PROC_FAILED_PENDING,
             * This request should not be freed since it is still active. */
            if( MPI_ERR_PROC_FAILED_PENDING != requests[i]->req_status.MPI_ERROR ) {
                ompi_request_free(&(requests[i]));
            }

mentOS31 avatar Oct 10 '25 05:10 mentOS31