Enable MSCCL++ enabled UBR test for AllReduce, AllGather with TestBed::RunSimpleSweep
Details
Do not mention proprietary info or link to internal work items in this PR.
Work item: "Internal", or link to GitHub issue (if applicable).
What were the changes?
Added unit tests for MSCCL++ AllGather and AllReduce in UBR mode.
Why were the changes made?
Previously these unit tests were using standalone routines and were not utilizing the TestBed infrastructure. Attempts at using TestBed::RunSimpleSweep caused a hang during MSCCL++ enabled ncclCommRegister
How was the outcome achieved?
Added input/output buffer registration, made AllocateMem non-blocking.
Additional Documentation:
MSCCL++ single-process mode is not supported in MSCCL++ and UT will fail unless UT_PROCESS_MASK is set to 2. This is why I use setenv/unsetenv in the scope of each added test.
Approval Checklist
Do not approve until these items are satisfied.
- [ ] Verify the CHANGELOG has been updated, if
- there are any NCCL API version changes,
- any changes impact library users, and/or
- any changes impact any other ROCm library.
Archiving this PR. Please remove noCI label when ready, or close this PR if not needed.