coll: improve alltoallv
Pull Request Description
-
add
MPIR_CVAR_CH4_PROGRESS_THROTTLEQ: should we always enable progress THROTTLE? -
The naive linear pairing will hold the large ranks until lower ranks get them. Rank N-1 will blocked at first exchange until Rank 0 near finish.
Slightly improve the algorithm, esp. for the high PPN case, do pair-wise exhcanges within each node first. Then finish the rest naive pairing over internode.
Also, the double loop then selecting rank seem to be a silly way of a single loop.
-
A better pairing by selecting sendrecv pairs using bit flipping. This exchanges with self first, then immediate neighbor, then neighbors at further bit distances. If the number of processes on each node is consecutive and takes power of 2, it will capture the node-first pairing as in the previous algorithm.
-
[ ] The same optimization should apply to the linear pairwise algorithms in
alltoallandalltoallw. The code smells like need a refactoring. [skip warnings]
Author Checklist
- [x] Provide Description Particularly focus on why, not what. Reference background, issues, test failures, xfail entries, etc.
- [x] Commits Follow Good Practice
Commits are self-contained and do not do two things at once.
Commit message is of the form:
module: short descriptionCommit message explains what's in the commit. - [ ] Passes All Tests Whitespace checker. Warnings test. Additional tests via comments.
- [x] Contribution Agreement For non-Argonne authors, check contribution agreement. If necessary, request an explicit comment from your companies PR approval manager.
test:mpich/ch3/most test:mpich/ch4/most
test:mpich/custom env: MPIR_CVAR_ALLTOALLV_PAIRWISE_NEW=1
test:mpich/ch3/most test:mpich/ch4/most
test:mpich/custom env: MPIR_CVAR_ALLTOALLV_PAIRWISE_NEW=1
test:mpich/ch3/most test:mpich/ch4/most
The tests passed