rccl
rccl copied to clipboard
Question about Group Assignment for P2P Primitives
Hi Everyone,
I am using rccl 6.1.2. For the P2P tests, such as scatter and gather, I find that the logic in the appendWorkElemP2p and finishWorkP2p functions of enqueue.cc entails the following:
- If there's both send and recv on a channel, group is 2: 2 64-thread warps for send and 2 for recv.
- If there's only recv, group is 1: 4 warps only for recv.
- If there's only send, group is 2: 2 warps operate on send and 2 warps are idle!
I wanted to understand the reasoning behind the last item, i.e., when there's only send primitives. This happens in the following situations:
- In gather, all GPUs except the root have only send primitives to the root.
- In scatter, the root GPU has only send primitives to all other GPUs.
Best, Alireza