ch4/rma/gpu: bypass yaksa when copying contiguous GPU buffers
Pull Request Description
This PR adds support to bypass Yaksa in RMA paths for contiguous buffers.
Last 3 commits only
Depends on #6608 (for MPIR_Ilocalcopy and all related non-blocking copy support)
Author Checklist
- [x] Provide Description Particularly focus on why, not what. Reference background, issues, test failures, xfail entries, etc.
- [x] Commits Follow Good Practice
Commits are self-contained and do not do two things at once.
Commit message is of the form:
module: short descriptionCommit message explains what's in the commit. - [ ] Passes All Tests Whitespace checker. Warnings test. Additional tests via comments.
- [x] Contribution Agreement For non-Argonne authors, check contribution agreement. If necessary, request an explicit comment from your companies PR approval manager.
Going to mark this as ready for review (pending approval/merge of #6608)
test:mpich/ch4/most
test:mpich/ch4/gpu
test:mpich/ch4/ofi
@hzhou ready for review - I think the test failures are unrelated, but let me know if you think otherwise.
test:mpich/ch4/gpu/ofi ✔️ except two cuda memory allocation errors.
test:mpich/ch4/gpu
Ready for re-review @hzhou
@abrooks98 Was the gpu/ofi tests clean?
test:mpich/ch4/gpu
@abrooks98 Was the gpu/ofi tests clean?
Going to run again because the one I kicked off last time had a Jenkins failure (Java heap space out of memory)
Recording sample of the test failures:
I don't think they are related to this PR, so I will merge it and figure out the test failure separately.