cutlass [QST]how to understand "Semaphore"

Hello, every cutlass experts, I'm confused by the implementation of Semaphore. its "fetch" like this:

if (wait_thread) {

      #if defined(__CUDA_ARCH__) && __CUDA_ARCH__ >= 700

      asm volatile ("ld.global.acquire.gpu.b32 %0, [%1];\n" : "=r"(state) : "l"(lock)); 

      #else

      asm volatile ("ld.global.cg.b32 %0, [%1];\n" : "=r"(state) : "l"(lock));

      #endif

    }

and its "release" like this:

if (wait_thread) {

      #if defined(__CUDA_ARCH__) && __CUDA_ARCH__ >= 700

      asm volatile ("st.global.release.gpu.b32 [%0], %1;\n" : : "l"(lock), "r"(status));

      #else

      asm volatile ("st.global.cg.b32 [%0], %1;\n" : : "l"(lock), "r"(status));

      #endif

    }

I can understand the branch which arch>= 700 but, I'm very confused with the branch which arch < 700. it just ld.global.cg.b32 and st.global.cg.b32, it is just global memory read and write. there is seems not any mutual exclusion, why? so, how to understand above global memory read and write?

Aug 04 '23 07:08 Shaquille-Wu

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

Sep 07 '23 14:09 github-actions[bot]

This issue has been labeled inactive-90d due to no recent activity in the past 90 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.

Dec 06 '23 14:12 github-actions[bot]

Closing due to inactivity

Feb 22 '24 14:02 mnicely

cutlass cutlass copied to clipboard

[QST]how to understand "Semaphore"

cutlass
cutlass copied to clipboard