cutlass icon indicating copy to clipboard operation
cutlass copied to clipboard

[QST]How to Handle Synchronization with Different Thread Counts for Producer and Consumer in CUTLASS?

Open ziyuhuang123 opened this issue 2 months ago • 1 comments

In scenarios where both producer and consumer threads exist, how can we achieve synchronization using CUTLASS's barrier.sync/arrive? I understand that in barrier.arrive(a, b), a represents the number of threads required to arrive, and b is the barrier_ID. However, the number of producer and consumer threads is often different.

In FlashAttention3, I saw this example: https://github.com/Dao-AILab/flash-attention/blob/0dfb28174333d9eefb7c1dd4292690a8458d1e89/hopper/mainloop_fwd_sm90_tma_gmma_ws.hpp#L651

Here, the a parameter includes the consumer thread count (256) plus the active threads in the producer (32). However, I don't understand why it is written this way.

ziyuhuang123 avatar Dec 18 '24 11:12 ziyuhuang123