rccl icon indicating copy to clipboard operation
rccl copied to clipboard

Question about Group Assignment for P2P Primitives

Open arkhadem opened this issue 1 year ago • 0 comments

Hi Everyone,

I am using rccl 6.1.2. For the P2P tests, such as scatter and gather, I find that the logic in the appendWorkElemP2p and finishWorkP2p functions of enqueue.cc entails the following:

  • If there's both send and recv on a channel, group is 2: 2 64-thread warps for send and 2 for recv.
  • If there's only recv, group is 1: 4 warps only for recv.
  • If there's only send, group is 2: 2 warps operate on send and 2 warps are idle!

I wanted to understand the reasoning behind the last item, i.e., when there's only send primitives. This happens in the following situations:

  • In gather, all GPUs except the root have only send primitives to the root.
  • In scatter, the root GPU has only send primitives to all other GPUs.

Best, Alireza

arkhadem avatar Jul 09 '24 01:07 arkhadem