cutlass
cutlass copied to clipboard
[QST]Behavior of TMA Store and Wait Mechanism in CUTLASS
In CUTLASS, there is a tma_store_wait function, which corresponds to cp.async.bulk.wait_group.read. Based on my observations while working with TMA, it seems that after completing a TMA-store operation, waiting is not necessary. It appears to behave like expect_tx, where the operation seems to complete automatically.