rccl icon indicating copy to clipboard operation
rccl copied to clipboard

Fix synchronization in allreduce8Read kernel

Open dsidler opened this issue 11 months ago • 4 comments

Running kernel allreduce8Read across 64 vGPUs (in CPX mode) revealed synchronization bugs. The PR addresses them by:

  • Synchronize threads before signaling that output (outChannels) are valid to guarantee ordering between all data writes in the block with corresponding signals.

Following changes are not affecting correctness:

  • Don't synchronize outChannels every iteration.
  • Synchronize input channels only once at the beginning of the kernel execution.

dsidler avatar Dec 10 '24 22:12 dsidler

@nusislam does the fix look good to you. If so, I will resolve the merge conflict.

dsidler avatar Jan 22 '25 22:01 dsidler

@nusislam does the fix look good to you. If so, I will resolve the merge conflict.

Could not build your branch, got build error apply the patch "error: corrupt patch at line 353"

nusislam avatar Jan 23 '25 14:01 nusislam

@nusislam This branch should now build. Do you have any comments on it?

corey-derochie-amd avatar Sep 08 '25 14:09 corey-derochie-amd

@nusislam This branch should now build. Do you have any comments on it?

I could not reproduce any issue running the existing allred8Read kernel on 64 GPUs.

nusislam avatar Sep 08 '25 15:09 nusislam