cutlass icon indicating copy to clipboard operation
cutlass copied to clipboard

[QST] Why we use three sync in sgemm_1?

Open ziyuhuang123 opened this issue 11 months ago • 2 comments

I am learning this example: https://github.com/NVIDIA/cutlass/blob/c4e3e122e266644c61b4af33d0cc09f4c391a64b/examples/cute/tutorial/sgemm_1.cu#L209-L211

What is your question?

    cp_async_fence();        // Label the end of (potential) cp.async instructions
    cp_async_wait<0>();      // Sync on all (potential) cp.async instructions
    __syncthreads();         // Wait for all threads to write to smem

Why we have three sync in sgemm_1???

ziyuhuang123 avatar Mar 29 '24 11:03 ziyuhuang123

On some new architectures say sm_80, the very basic form copy(...), that is, without specifying the copy atom, might generate cp.async instruction. So you need further safety guarantee.

#1231

cloudhan avatar Apr 07 '24 15:04 cloudhan

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

github-actions[bot] avatar May 09 '24 01:05 github-actions[bot]