charm Optimization of MPI layer

Original issue: https://charm.cs.illinois.edu/redmine/issues/23

Revisit MPI layer and possibly rewrite to improve performance.

Feb 06 '13 06:02 nikhil-jain

Original date: 2013-02-06 06:29:03

I've heard through the grapevine (not to be confused with Harshitha's LB strategy) that Ralf has achieved near-parity with some native layer in his work at Argonne for Pavan. Marking it 'in-progress' accordingly.

Apr 24 '19 20:04 PhilMiller

Original date: 2013-02-11 22:24:31

Quoting Ralf's Email on this-

As for MPI, we've found something that apparently makes it be as fast as gemini_gni-crayxe, and the same seems to be true of ibverbs. We're testing this on intrepid & vesta as well. Pavan doesn't care about tcp (i.e. MPI over sockets) at all, where the difference in performance is biggest (almost 4x).

Apr 24 '19 20:04 nikhil-jain

Original date: 2017-01-17 03:29:53

It looks like none of the work mentioned above was ever merged...

Apr 24 '19 20:04 stwhite91

Original date: 2018-04-29 22:45:56

Small optimization to use MPI-3's MPI_Mprobe and MPI_Mrecv where possible: ~~https://charm.cs.illinois.edu/gerrit/#/c/charm/+/2785/~~ https://github.com/UIUC-PPL/charm/commit/f7fbaaaae1088ee65988c1345203da3ed55abcc4

Edit: we ended up reverting this because support for MPI_Mprobe is spotty and not really detectable at configure time in a portable manner.

Apr 24 '19 20:04 stwhite91

I reached out to Ralf in 2018 about these patches and they are presumed lost. However he did mention that "IIRC the code was mostly using MPI3 RMA operations in a fairly natural way."

Mar 11 '20 18:03 evan-charmworks

wow.. it was not even saved on a branch?

Mar 12 '20 14:03 lvkale

Not if it was never pushed out of his local repository.

Mar 12 '20 14:03 ericjbohm

Here's some minor MPI layer optimizations we've done in the recent-ish past with some notes on their status / motivations:

Add build-time option to use MPI_Alloc_mem inside CmiAlloc. This will potentially speed up messaging but is off by default pending more performance testing and may need more pooling of messages: https://github.com/UIUC-PPL/charm/commit/724b2a02084e288bbd6cbdd22b60891b504ec333

Allow enabling preposted receives with build-time flag. This is potentially an improvement for small/medium sized messages, but may require more tuning of parameters. This is off by default still: https://github.com/UIUC-PPL/charm/commit/868e39ad0738dbb3a32cca651919819cd1064a10

Add MPI_Info assertion on Charm communicator that Charm does not require MPI msg ordering. If an MPI implementation optimizes for this, it may exploit it, but no known implementations optimize for this yet: https://github.com/UIUC-PPL/charm/commit/cb4d760c4eaf478d80e2873e4dfd243933042744

Use MPI-3 matching Mprobe/Mrev functions. This potentially decreases MPI queue access times, but it was found to be implemented incorrectly in multiple MPI implementations so we had to revert it for correctness. Also, since we are almost always doing wildcard receives our MPI queue access times are actually minimal since we just take whatever is at the head of the queue: https://github.com/UIUC-PPL/charm/commit/f7fbaaaae1088ee65988c1345203da3ed55abcc4

Mar 27 '23 15:03 stwhite91

charm charm copied to clipboard

Optimization of MPI layer

charm
charm copied to clipboard