cutlass [QST]How Does TMA Work in CUTLASS for Writing from Shared Memory to Global Memory?

[QST]How Does TMA Work in CUTLASS for Writing from Shared Memory to Global Memory?

Open ziyuhuang123 opened this issue 2 months ago • 1 comments

Could you explain how TMA works in CUTLASS? For example, when writing from the shared memory Tensor sS to the global memory Tensor gD, it seems that the data is written sequentially, i.e., sS[i] directly maps to gD[i]. Is this the correct behavior?

Dec 23 '24 07:12 ziyuhuang123

cutlass cutlass copied to clipboard

[QST]How Does TMA Work in CUTLASS for Writing from Shared Memory to Global Memory?

cutlass
cutlass copied to clipboard