mpich icon indicating copy to clipboard operation
mpich copied to clipboard

MPIX_Stream posix_progress.h Assertion Triggered

Open ryandeng1 opened this issue 8 months ago • 6 comments

Hi,

This is a continuation of the question here: https://github.com/pmodels/mpich/issues/7279.

Does this assertion here: https://github.com/pmodels/mpich/blob/5c2809758a830b4e054cc293ca4227afe33b574d/src/mpid/ch4/shm/posix/posix_progress.h#L71, indicate that you cannot have multiple different MPIX streams simultaneously receiving from the same rank?

So if I have two streams A and B on rank 1 both receiving from rank 0, and I call MPIX_Stream_irecv on stream A and B (serially one after another, as these operations are asynchronous), that would trigger this error? I understand from the previous issues that I cannot have two threads simultaneously access the same stream, but this assertion also indicates I cannot have two streams simultaneously on rank 1 receiving data from the same rank (in this example rank 0)?

Thanks in advance.

ryandeng1 avatar Jun 30 '25 08:06 ryandeng1


                MPIR_Assert(MPIDI_POSIX_global.
                            per_vci[vci].active_rreq[transaction.src_local_rank] == NULL);
                MPIDI_POSIX_global.per_vci[vci].active_rreq[transaction.src_local_rank] = rreq;

The vci index points to unique MPIX stream.

The assertion checks for messaging protocol consistency. If it gets an initial packet of a pipelined message, the previous message (in the same stream) must be completed already.

hzhou avatar Jun 30 '25 22:06 hzhou

Thanks for the response. What does this mean in terms of what I can and cannot do with streams? I assume this means that on the same stream, I cannot call MPIX_Stream_irecv with the same source/destination processes twice in a row?

ryandeng1 avatar Jul 06 '25 09:07 ryandeng1

Thanks for the response. What does this mean in terms of what I can and cannot do with streams? I assume this means that on the same stream, I cannot call MPIX_Stream_irecv with the same source/destination processes twice in a row?

Yes, you can, just as you can call MPI_Irecv with the same source/destination process twice in a row. The only rule is you shouldn't access the same stream (send or recv) concurrently from multiple threads.

hzhou avatar Jul 07 '25 15:07 hzhou

Also, the streams do not progress or order with each other. So watch out potential deadlocks due to race conditions between different streams.

hzhou avatar Jul 07 '25 15:07 hzhou

I see, I am somehow getting this error only when I am calling MPI_Irecv on the same stream with the same source/destination process twice in a row, and then calling progress on that stream later on.

Another related question is that if I have multiple operations on one stream, does MPI_Stream_progress progress all of those operations, or does it progress the first operation, and then once that operation finishes, it will progress the second operation on that stream?

ryandeng1 avatar Jul 10 '25 08:07 ryandeng1

MPI_Stream_progress will progress all operation on the stream.

hzhou avatar Jul 10 '25 16:07 hzhou