ch4/ipc: refactor IPC and add CMA module
Pull Request Description
This is a temporary PR for reference. It will be rebased and possibly split into separate PRs.
[skip warnings]
TODO:
-
[ ] Should we enable cma by default? Many distribution default PTRACE scope to "1", which will result in
EPERMinprocess_vm_readv. Thus, default on will raise many support issues. -
[ ]
--with-ch4-shmmods=posix,xpmem,cma,gpudirectis unintuitive. I think it is better to use individual option e.g.--with-cma. We already use--with-xpmemand--with-cuda(and --without-` to disable).EDIT: address these in #7040
Author Checklist
- [x] Provide Description Particularly focus on why, not what. Reference background, issues, test failures, xfail entries, etc.
- [x] Commits Follow Good Practice
Commits are self-contained and do not do two things at once.
Commit message is of the form:
module: short descriptionCommit message explains what's in the commit. - [ ] Passes All Tests Whitespace checker. Warnings test. Additional tests via comments.
- [x] Contribution Agreement For non-Argonne authors, check contribution agreement. If necessary, request an explicit comment from your companies PR approval manager.
@hzhou I am rebasing the CMA code for testing. Is the IPC cleanup commits still relevant?
@hzhou I am rebasing the CMA code for testing. Is the IPC cleanup commits still relevant?
The IPC cleanup is the main purpose of this PR. You can try cherry-pick the CMA commit for your testing.
I pushed a few changes to fix the GPU and non-GPU build. Also making the CMA configure option the same fashion as the rest shmmods.
Should I rebased first. Will add back the white space changes from your last update.
test:mpich/ch4/ofi test:mpich/ch4/gpu/ofi test:mpich/ch4/xpmem All ✔️
test:mpich/ch4/ofi ✔️
test:mpich/ch4/gpu/ofi ✔️
test:mpich/ch4/xpmem - ipc src_dt_ptr was unset
test:mpich/custom netmod: ch4:ofi config: cma
tag @raffenet for review
test:mpich/ch4/xpmem - 2 failures
- TIMEOUT - vci - pt2pt/sendflood 8
- avltree leak - debug - coll/alltoallw_zeros 8
test:mpich/custom ✔️ netmod: ch4:ofi config: cma
test:mpich/ch4/xpmem
2 failures:
- TIMEOUT - asan - coll/nonblocking3 5
- TIMEOUT - vci - pt2pt/sendflood 8
These are likely performance issues not related to this PR.
@raffenet This PR is ready to go.