rccl
rccl copied to clipboard
Add device synchronization before destroying proxy thread.
This commit ensures that GPU finishes all kernel before destroying communicator thread.
Details
Do not mention proprietary info or link to internal work items in this PR.
Work item: "Internal", or link to GitHub issue (if applicable).
What were the changes?
Make sure kernel finishes executing before destroying proxy thread.
Why were the changes made?
Some application might hang if some rank exited too early.
How was the outcome achieved?
Add a synchronization before calling destroy function.
Additional Documentation:
Approval Checklist
Do not approve until these items are satisfied.
- [x] Verify the CHANGELOG has been updated, if
- there are any NCCL API version changes,
- any changes impact library users, and/or
- any changes impact any other ROCm library.