mpich icon indicating copy to clipboard operation
mpich copied to clipboard

bug/jenkins: pt2pt/rqfreeb leaks requests

Open minsii opened this issue 7 years ago • 8 comments

In test pt2pt/rqfreeb, the last irecv request r[4] is intentionally freed by MPI_Request_free (line 111). In CH4/UCX, however, the request is completed at MPIDI_UCX_recv_cmpl_cb which might or might not be called before MPI_Finalize. Thus, r[4] might not be released and result in memory leak warning. https://github.com/pmodels/mpich/blob/baf32f4b8a84ccc6bd18ddd7e00a2422f4e59ed9/test/mpi/pt2pt/rqfreeb.c#L111

Failing example: https://jenkins.mpich.org/job/mpich-review-ch4-ucx/1639/compiler=gnu,jenkins_configure=debug,label=ib64/testReport/junit/(root)/summary_junit_xml/585_____pt2pt_rqfreeb_4__/

Possible solution: Since this test is to test ibsend+request_free, can just always use MPI_Wait for r[4] in order to avoid such warning ?

minsii avatar Dec 14 '18 16:12 minsii

This usage is correct according to the MPI standard. Can we delay the leak check until after the netmod/shmmods are finalized?

raffenet avatar Dec 14 '18 16:12 raffenet

Sure.

minsii avatar Dec 14 '18 16:12 minsii

I think this has been fixed.

hzhou avatar Jun 05 '21 23:06 hzhou

Here is simplified version of the test that fails consistently.

/*
 * Copyright (C) by Argonne National Laboratory
 *     See COPYRIGHT in top-level directory
 */

#include <mpi.h>
#include <assert.h>
#include <stdlib.h>

int main(int argc, char *argv[])
{
    int rank, size;

    MPI_Init(&argc, &argv);
    MPI_Comm_rank(MPI_COMM_WORLD, &rank);
    MPI_Comm_size(MPI_COMM_WORLD, &size);

    assert(size == 2);

    if (rank == 0) {
        MPI_Barrier(MPI_COMM_WORLD);

        MPI_Send(NULL, 0, MPI_DATATYPE_NULL, 1, 0, MPI_COMM_WORLD);
    }

    if (rank == 1) {
        MPI_Request r;

        MPI_Irecv(NULL, 0, MPI_DATATYPE_NULL, 0, 0, MPI_COMM_WORLD, &r);
        MPI_Request_free(&r);

        MPI_Barrier(MPI_COMM_WORLD);
    }

    MPI_Finalize();
    return 0;
}

According to my comment (https://github.com/pmodels/mpich/issues/3469#issuecomment-447376549), past me thinks this is a correct MPI program. UCX is shutting down without completing the request, and also complaining about the leak, so delaying the checks is not a solution.

[1629493927.753398] [pmrs-centos64-240-01:152642:0]           mpool.c:44   UCX  WARN  object 0x16b8f00 was not returned to mpool ucp_requests
In direct memory block for handle type REQUEST, 1 handles are still allocated

What we could do it internally track all requests (or maybe user-freed incomplete requests) in the netmod and try to complete them during finalize. I should confirm with the forum that completion is expected of the MPI library in this case before we commit too much effort.

raffenet avatar Aug 20 '21 21:08 raffenet

Here's a slightly stronger example.

/*
 * Copyright (C) by Argonne National Laboratory
 *     See COPYRIGHT in top-level directory
 */

#include <mpi.h>
#include <stdio.h>

int main(int argc, char *argv[])
{
    int rank, size;

    MPI_Init(&argc, &argv);
    MPI_Comm_rank(MPI_COMM_WORLD, &rank);
    MPI_Comm_size(MPI_COMM_WORLD, &size);

    if (size != 2) {
        MPI_Abort(MPI_COMM_WORLD, 1);
    }

    int foo;
    if (rank == 0) {
        MPI_Barrier(MPI_COMM_WORLD);

        foo = 42;
        MPI_Send(&foo, 1, MPI_INT, 1, 0, MPI_COMM_WORLD);
    } else if (rank == 1) {
        MPI_Request r;

        foo = 0;
        MPI_Irecv(&foo, 1, MPI_INT, 0, 0, MPI_COMM_WORLD, &r);
        MPI_Request_free(&r);

        MPI_Barrier(MPI_COMM_WORLD);
    }

    MPI_Finalize();
    if (rank == 1) {
        printf("foo = %d\n", foo);
    }

    return 0;
}

Should the output of this be? I'll take it up with the forum...

foo = 42

raffenet avatar Aug 20 '21 21:08 raffenet

Should the output of this be? I'll take it up with the forum...

foo = 42
foo = 42

I think it should be undefined. But still, we shouldn't leak memory.

hzhou avatar Aug 20 '21 23:08 hzhou

I edited the test a bit and sent it on the MPI Forum Slack. As I suspected, the Forum believes that this is a correct program and that the output is well defined. In short, MPI_FINALIZE guarantees completion of the freed request.

raffenet avatar Aug 23 '21 18:08 raffenet

The handle leak still can happen with the ch4:ofi netmod. Unlike ucx, libfabric will happily close its resource without checking for pending communications. Thus, we'll need our separate mechanism to keep track of pending requests.

It can be reproduced with current main branch. Just build with --enable-g=all and run the test in a loop. It reproduces the leak failure 1/100 or so. If we replace all MPI_Wait with MPI_Request_free in the test, the leak can be reproduced half of the cases.

I think we can simply keep a per-vci request pool counter and poll progress in finalize until the counter reaches zero. But that also means an (incorrect) program will hang at finalize waiting for communication that never completes.

Just hit this failure:

not ok 1675 - ./pt2pt/rqfreeb 4
  ---
  Directory: ./pt2pt
  File: rqfreeb
  Num-procs: 4
  Timeout: 180
  Date: "Mon May 23 15:14:39 2022"
  ...
## Test output (expected 'No Errors'):
##  No Errors
## In direct memory block for handle type REQUEST, 1 handles are still allocated
## In direct memory block for handle type COMM, 2 handles are still allocated
## [1] 360 at [0x2bceec0], src/mpi/coll/src/csel.c[686]
## [1] 360 at [0x2bceca0], src/mpi/coll/src/csel.c[686]
## [1] 360 at [0x2bcea80], src/mpi/coll/src/csel.c[686]
## [1] 360 at [0x2bce860], src/mpi/coll/src/csel.c[686]
## [1] 360 at [0x2bce640], src/mpi/coll/src/csel.c[686]
## [1] 360 at [0x2bce140], src/mpi/coll/src/csel.c[686]
## [1] 360 at [0x2bcddd0], src/mpi/coll/src/csel.c[686]
## [1] 360 at [0x2bcdbb0], src/mpi/coll/src/csel.c[686]
## [1] 360 at [0x2bcd990], src/mpi/coll/src/csel.c[686]
## [1] 16 at [0x2bcd8d0], src/util/mpir_localproc.c[161]
## [1] 16 at [0x2bcd750], src/util/mpir_localproc.c[51]
## [1] 16432 at [0x2626dc0], src/mpid/ch4/src/ch4_proc.c[168]

It's easier to hit this with GPU enabled, but it also can be reproduced without GPU.

hzhou avatar May 24 '22 17:05 hzhou