ucc
ucc copied to clipboard
Add some CPU collectives to the NCCL TL
What
Adds support for CPU collectives to NCCL TL (only covers a subset for now).
Why ?
Currently UCC requires two TLs (NCCL and UCP) to fully support NVIDIA GPU platforms. This patch allows for the NCCL TL to support CPU collectives too, so we do not need to rely on two different TLs for the full coverage.
How ?
Stages CPU data through the GPU and makes NCCL calls on the GPU-resident data.
Can one of the admins verify this patch?