rccl icon indicating copy to clipboard operation
rccl copied to clipboard

Add device synchronization before destroying proxy thread.

Open thananon opened this issue 8 months ago • 0 comments

This commit ensures that GPU finishes all kernel before destroying communicator thread.

Details

Do not mention proprietary info or link to internal work items in this PR.

Work item: "Internal", or link to GitHub issue (if applicable).

What were the changes?
Make sure kernel finishes executing before destroying proxy thread.

Why were the changes made?
Some application might hang if some rank exited too early.

How was the outcome achieved?
Add a synchronization before calling destroy function.

Additional Documentation:

Approval Checklist

Do not approve until these items are satisfied.

  • [x] Verify the CHANGELOG has been updated, if
    • there are any NCCL API version changes,
    • any changes impact library users, and/or
    • any changes impact any other ROCm library.

thananon avatar Apr 08 '25 21:04 thananon