cutlass
cutlass copied to clipboard
[QST]how to understand "Semaphore"
Hello, every cutlass experts, I'm confused by the implementation of Semaphore. its "fetch" like this:
if (wait_thread) {
#if defined(__CUDA_ARCH__) && __CUDA_ARCH__ >= 700
asm volatile ("ld.global.acquire.gpu.b32 %0, [%1];\n" : "=r"(state) : "l"(lock));
#else
asm volatile ("ld.global.cg.b32 %0, [%1];\n" : "=r"(state) : "l"(lock));
#endif
}
and its "release" like this:
if (wait_thread) {
#if defined(__CUDA_ARCH__) && __CUDA_ARCH__ >= 700
asm volatile ("st.global.release.gpu.b32 [%0], %1;\n" : : "l"(lock), "r"(status));
#else
asm volatile ("st.global.cg.b32 [%0], %1;\n" : : "l"(lock), "r"(status));
#endif
}
I can understand the branch which arch>= 700
but, I'm very confused with the branch which arch < 700.
it just ld.global.cg.b32
and st.global.cg.b32
, it is just global memory read and write.
there is seems not any mutual exclusion, why?
so, how to understand above global memory read and write?
This issue has been labeled inactive-30d
due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d
if there is no activity in the next 60 days.
This issue has been labeled inactive-90d
due to no recent activity in the past 90 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.
Closing due to inactivity