mpich icon indicating copy to clipboard operation
mpich copied to clipboard

coll: add coll_attr and comm subgroups

Open hzhou opened this issue 2 years ago • 2 comments

Pull Request Description

Add a coll_attr parameter to replace the errflag parameter in internal collective interfaces. Make the lower 8-bit of coll_attr compatible to the lower 8-bit of pt2pt attr, which will avoid extra code to translate bits such as errflags when passing from collective to point-to-point. The next 8-bit is used for subgroup indexes, enabling group collectives without extra subcomms (which are expensive to maintain). We may extend in future coll_attr for passing hints such as memory alloc kinds and algorithm choices.

Add a bcast smp_new algorithm that are similar to bcast smp but uses comm subgroups instead. Because we can construct lightweight custom subgroup, we can avoid the extra local send/recv or bcast step when root is not one of the "node roots". Instead of the node_roots_comm, we can construct a inter_group made of the actual local roots.

NOTE: the bcast smp_new is covered in the collective cvar tests.

[skip warnings]

Author Checklist

  • [x] Provide Description Particularly focus on why, not what. Reference background, issues, test failures, xfail entries, etc.
  • [x] Commits Follow Good Practice Commits are self-contained and do not do two things at once. Commit message is of the form: module: short description Commit message explains what's in the commit.
  • [ ] Passes All Tests Whitespace checker. Warnings test. Additional tests via comments.
  • [x] Contribution Agreement For non-Argonne authors, check contribution agreement. If necessary, request an explicit comment from your companies PR approval manager.

hzhou avatar Jul 11 '23 18:07 hzhou

test:mpich/ch3/most test:mpich/ch4/most

hzhou avatar Jul 11 '23 18:07 hzhou

test:mpich/ch3/tcp test:mpich/ch4/ofi

All ✔️

hzhou avatar Aug 13 '24 00:08 hzhou

test:mpich/ch3/most test:mpich/ch4/most

hzhou avatar Aug 13 '24 21:08 hzhou