ompi icon indicating copy to clipboard operation
ompi copied to clipboard

MPI_TYPE_INDEXED + MPI_SEND/RECV slow with older infiniband network?

Open chhu opened this issue 1 year ago • 4 comments

Related to #12202 but without CUDA. On our shared-memory system (2xEPYC) MPI_TYPE_INDEXED works fast as expected, but as soon as our 40GBit Infiniband gets involved performance breaks down by a factor of 2-5. This does not happen with the same OMPI and linear buffers (arrays).

Speed and response time of IB is very high and working fine as expected.

I do not see this behavior on our big HPC system that has 100G IB, even with the same OMPI. Is there something I can tune? How does OMPI transmit indexed types? Single request per block or scatter/gather into linear array first?

Thanks!

What version of Open MPI are you using? (e.g., v3.0.5, v4.0.2, git branch name and hash, etc.)

Tested on 3.1, 4.1 and 5.1 latest

Please describe the system on which you are running

See #12202

chhu avatar Jan 03 '24 12:01 chhu

Is performance impact of using MPI_TYPE_INDEXED on 100G IB HPC system negligible or just smaller than on 40G systems? I'd expect it to be noticable on any system, as UCX does not use certain protocols when data is not contigious.

brminich avatar Jan 16 '24 08:01 brminich

Only thing I can say is that on 100G IB the TYPE_INDEX has no notable impact, while on the 40G it has a major impact. Are you suggesting one should avoid non-contiguous data exchange?

chhu avatar Jan 25 '24 10:01 chhu

yes, using non-contigious data may imply some limitations on mpi/ucx/network protocols

brminich avatar Jan 25 '24 12:01 brminich

Hmm, maybe it would be a nice feature to linearize into a new buffer first before the exchange? Maybe let the user control this via a threshold setting?

chhu avatar Jan 26 '24 11:01 chhu