mpich icon indicating copy to clipboard operation
mpich copied to clipboard

ch4: use am_tag_{send,recv} in RMA get/put

Open hzhou opened this issue 1 year ago • 7 comments

Pull Request Description

Use am_tag_{send,recv} in RMA get/put.

This potentially works around the issue #7118

[skip warnings]

TODO

  • [x] Get
  • [x] Put

Author Checklist

  • [x] Provide Description Particularly focus on why, not what. Reference background, issues, test failures, xfail entries, etc.
  • [x] Commits Follow Good Practice Commits are self-contained and do not do two things at once. Commit message is of the form: module: short description Commit message explains what's in the commit.
  • [ ] Passes All Tests Whitespace checker. Warnings test. Additional tests via comments.
  • [x] Contribution Agreement For non-Argonne authors, check contribution agreement. If necessary, request an explicit comment from your companies PR approval manager.

hzhou avatar Nov 07 '24 18:11 hzhou

test:mpich/ch4/ucx test:,mpich/ch4/ofi/more

hzhou avatar Nov 07 '24 19:11 hzhou

test:mpich/ch4/most test:mpich/ch4/ofi/more

hzhou avatar Nov 10 '24 15:11 hzhou

@hzhou can you rebase this on main and resolve conflicts? Now that some GPU fixes are in to resolve memory issues, we are hoping this will now fix #7118

abrooks98 avatar Mar 24 '25 21:03 abrooks98

test:mpich/ch4/most test:mpich/ch4/ofi/more

hzhou avatar Mar 25 '25 00:03 hzhou

@hzhou can you rebase this on main and resolve conflicts? Now that some GPU fixes are in to resolve memory issues, we are hoping this will now fix #7118

Sure. Please test on aurora. Last the the user reported still seeing the hang with this PR.

hzhou avatar Mar 25 '25 00:03 hzhou

Is it worth updating this PR to use the new rndv/pipeline infrastructure? Anyway worth retesting as-is with https://github.com/pmodels/mpich/issues/7118. Will update the PR to facilitate testing.

raffenet avatar Sep 24 '25 19:09 raffenet

Updated.

hzhou avatar Oct 10 '25 18:10 hzhou