cub Explicitly document synchronization requirements in Warp-level APIs

Explicitly document synchronization requirements in Warp-level APIs

Open desert0616 opened this issue 6 years ago • 1 comments

For all warp-based cub api, say warpscan, the example given by the document do not use __syncwarp to sync threads within a warp.

However, it seems that in volta, threads within a warp are no longer synchronized naturally and seems __syncwarp is required before and after warp-based communication.

So, need I use __syncwarp for warpscan just like use __syncthreads in blockscan?

Thanks!

Aug 11 '18 08:08 desert0616

Just to close the loop, a final sync is not needed, and the shfl-based implementation will use the shfl.sync instructions when required for a given arch. Leaving this open as a reminder to address this in our next documentation push.

May 06 '22 22:05 alliepiper

cub cub copied to clipboard

Explicitly document synchronization requirements in Warp-level APIs

cub
cub copied to clipboard