cutlass
cutlass copied to clipboard
[QST]How Does TMA Work in CUTLASS for Writing from Shared Memory to Global Memory?
Could you explain how TMA works in CUTLASS? For example, when writing from the shared memory Tensor sS to the global memory Tensor gD, it seems that the data is written sequentially, i.e., sS[i] directly maps to gD[i]. Is this the correct behavior?