mpich icon indicating copy to clipboard operation
mpich copied to clipboard

bug: ch4/ofi/psm2: poor put/get performance when both origin and target datatypes are noncontig

Open minsii opened this issue 7 years ago • 3 comments

When using RMA put or get to implement the halo exchange in 2D stencil, the performance of east/west exchange is much worse than that using send/recv.

Below is the performance numbers on 4 inter-connected processes on Argonne Bebop (Broadwell + OmniPath).

Problem Size PUT contig (north+south) PUT noncontig (east+west) PT2PT contig PT2PT noncontig
64 2.7076 24.7242 4.5245 3.5107
128 2.7854 54.9428 4.4331 2.2331
256 3.006 109.952 4.4955 2.387
512 3.3544 233.4858 2.7368 41.7567
1024 4.0334 501.6856 3.425 81.9862
2048 5.393 1003.2947 5.0462 162.1536
4096 13.8641 2005.5851 11.5831 336.516
8192 15.0235 4758.1552 16.3524 668.5294
16384 27.9602 9287.5796 24.4589 1343.4111
32768 46.1634 20758.3982 44.3566 2651.3244

In summary, the noncontig part shows up to 10x worse performance by using PUT(same for GET). This might be a performance issue of PUT/GET when both origin and target datatypes are noncontiguous.

Further investigation might be needed also for RMA over SHM.

minsii avatar Jul 28 '18 18:07 minsii

Tagging @shawnccx @nusislam

hajimefu avatar Jul 30 '18 14:07 hajimefu

@minsii how can I re-run the benchmarks used in the issue description? I would like to compare performance the current master with RMA non-contig changes included.

raffenet avatar Jun 03 '20 16:06 raffenet

I think I used the MPI tutorial code. Can you try this: stencil_lock_put.c

minsii avatar Jun 04 '20 03:06 minsii