cutlass
cutlass copied to clipboard
[QST] Why we use three sync in sgemm_1?
I am learning this example: https://github.com/NVIDIA/cutlass/blob/c4e3e122e266644c61b4af33d0cc09f4c391a64b/examples/cute/tutorial/sgemm_1.cu#L209-L211
What is your question?
cp_async_fence(); // Label the end of (potential) cp.async instructions
cp_async_wait<0>(); // Sync on all (potential) cp.async instructions
__syncthreads(); // Wait for all threads to write to smem
Why we have three sync in sgemm_1???
On some new architectures say sm_80, the very basic form copy(...)
, that is, without specifying the copy atom, might generate cp.async instruction. So you need further safety guarantee.
#1231
This issue has been labeled inactive-30d
due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d
if there is no activity in the next 60 days.