mpich icon indicating copy to clipboard operation
mpich copied to clipboard

ch4/posix: using polling loop for recv and send queue

Open yfguo opened this issue 2 months ago • 3 comments

Trying do multiple polling for the recv and send queue to improve the message latency. This helps the for multiple incoming messages or deferred sends use cases. Two CVARs are add to control the maximum number of loops in one progress call If any of the queue is empty, POSIX will skip the rest iterations. The CVARs are default to 1 which disables the multi-polling.

I got some unexpected result. I was testing with test/mpi/bench/p2p_bw and it was improving the throught with send/recv loops set to 8. image

But, when I try OSU_BW, the results are inverted. I need to understand why this is the case.

image

Pull Request Description

Author Checklist

  • [ ] Provide Description Particularly focus on why, not what. Reference background, issues, test failures, xfail entries, etc.
  • [ ] Commits Follow Good Practice Commits are self-contained and do not do two things at once. Commit message is of the form: module: short description Commit message explains what's in the commit.
  • [ ] Passes All Tests Whitespace checker. Warnings test. Additional tests via comments.
  • [ ] Contribution Agreement For non-Argonne authors, check contribution agreement. If necessary, request an explicit comment from your companies PR approval manager.

yfguo avatar Oct 25 '25 03:10 yfguo

I got some unexpected result. I was testing with test/mpi/bench/p2p_bw and it was improving the throught with send/recv loops set to 8. image

But, when I try OSU_BW, the results are inverted. I need to understand why this is the case.

image

yfguo avatar Oct 30 '25 14:10 yfguo

I also changed the default of the CVARs to 1. This way, the multi-polling becomes a opt-in feature.

yfguo avatar Nov 05 '25 20:11 yfguo

The usefulness of the patch need more investigation.

yfguo avatar Nov 26 '25 18:11 yfguo