mpich icon indicating copy to clipboard operation
mpich copied to clipboard

coll: update the json selection of MPIR_Bcast_intra_scatter_ring_allgather

Open hzhou opened this issue 1 year ago • 2 comments

Pull Request Description

The MPIR_Bcast_intra_scatter_ring_allgather won't perform if the per_proc_msg_size (chunk size) is too small, which accumulates latency in each round.

Fixes #7330 [skip warnings]

Author Checklist

  • [x] Provide Description Particularly focus on why, not what. Reference background, issues, test failures, xfail entries, etc.
  • [x] Commits Follow Good Practice Commits are self-contained and do not do two things at once. Commit message is of the form: module: short description Commit message explains what's in the commit.
  • [ ] Passes All Tests Whitespace checker. Warnings test. Additional tests via comments.
  • [x] Contribution Agreement For non-Argonne authors, check contribution agreement. If necessary, request an explicit comment from your companies PR approval manager.

hzhou avatar Mar 12 '25 22:03 hzhou

Did you do a performance test with this, or are these changes just based on the conversation earlier this week? Its on my TODOs to do some performance runs on Aurora, so I can try testing this too

mjwilkins18 avatar Mar 13 '25 13:03 mjwilkins18

Did you do a performance test with this, or are these changes just based on the conversation earlier this week? Its on my TODOs to do some performance runs on Aurora, so I can try testing this too

Thanks for volunteering! :)

The patch is to address the obvious issue so it don't perform outrageously bad. Yes, we should use tests to finetune the threshold.

hzhou avatar Mar 13 '25 15:03 hzhou