ch4/ofi: Add support for NIC assignment for SNC4 mode for Aurora
Pull Request Description
The PR adds a preferred NIC assignment for ranks mapped to different sub-NUMA nodes when CPU is in SNC4 mode. The implementation is specific to Aurora node layout.
PR also adds helper functions to identify the SNC4 nodes(reported as groups by hwloc) and to find the closest NICs for ranks on a specific SNC node.
Previous round robin NIC assignment is preserved for non-SNC mode.
Author Checklist
- [x] Provide Description Particularly focus on why, not what. Reference background, issues, test failures, xfail entries, etc.
- [x] Commits Follow Good Practice
Commits are self-contained and do not do two things at once.
Commit message is of the form:
module: short descriptionCommit message explains what's in the commit. - [x] Passes All Tests Whitespace checker. Warnings test. Additional tests via comments.
- [x] Contribution Agreement For non-Argonne authors, check contribution agreement. If necessary, request an explicit comment from your companies PR approval manager.
test:mpich/ch4/ofi
@hzhou The tests passed.
test:mpich/ch4/ofi
test:mpich/ch4/ofi
test:mpich/ch3/tcp
Thanks for the review @hzhou! Latest testing passed prior to rebasing (ch4 testing, ch3 testing), so I think we can merge once the basic checks complete
Merging the branch since the tests passed and is rebased on top of latest main.