mpich icon indicating copy to clipboard operation
mpich copied to clipboard

ch4/rma/gpu: bypass yaksa when copying contiguous GPU buffers

Open abrooks98 opened this issue 2 years ago • 1 comments

Pull Request Description

This PR adds support to bypass Yaksa in RMA paths for contiguous buffers.

Last 3 commits only Depends on #6608 (for MPIR_Ilocalcopy and all related non-blocking copy support)

Author Checklist

  • [x] Provide Description Particularly focus on why, not what. Reference background, issues, test failures, xfail entries, etc.
  • [x] Commits Follow Good Practice Commits are self-contained and do not do two things at once. Commit message is of the form: module: short description Commit message explains what's in the commit.
  • [ ] Passes All Tests Whitespace checker. Warnings test. Additional tests via comments.
  • [x] Contribution Agreement For non-Argonne authors, check contribution agreement. If necessary, request an explicit comment from your companies PR approval manager.

abrooks98 avatar Aug 24 '23 20:08 abrooks98

Going to mark this as ready for review (pending approval/merge of #6608)

abrooks98 avatar Oct 05 '23 16:10 abrooks98

test:mpich/ch4/most

abrooks98 avatar Apr 01 '24 13:04 abrooks98

test:mpich/ch4/gpu

abrooks98 avatar Apr 01 '24 13:04 abrooks98

test:mpich/ch4/ofi

abrooks98 avatar Apr 01 '24 18:04 abrooks98

@hzhou ready for review - I think the test failures are unrelated, but let me know if you think otherwise.

abrooks98 avatar Apr 01 '24 20:04 abrooks98

test:mpich/ch4/gpu/ofi ✔️ except two cuda memory allocation errors.

hzhou avatar Apr 02 '24 19:04 hzhou

test:mpich/ch4/gpu

abrooks98 avatar Apr 04 '24 17:04 abrooks98

Ready for re-review @hzhou

abrooks98 avatar Apr 05 '24 16:04 abrooks98

@abrooks98 Was the gpu/ofi tests clean?

hzhou avatar Apr 11 '24 14:04 hzhou

test:mpich/ch4/gpu

abrooks98 avatar Apr 11 '24 14:04 abrooks98

@abrooks98 Was the gpu/ofi tests clean?

Going to run again because the one I kicked off last time had a Jenkins failure (Java heap space out of memory)

abrooks98 avatar Apr 11 '24 14:04 abrooks98

Recording sample of the test failures: image

I don't think they are related to this PR, so I will merge it and figure out the test failure separately.

hzhou avatar Apr 11 '24 17:04 hzhou