cutlass
cutlass copied to clipboard
[FEA] make_tma_copy doesnot support shared-to-shared copy
Describe the bug In PTX, I noticed that
cp.async.bulk.dst.src.completion_mechanism [dstMem], [srcMem], size, [mbar]
.dst = { .shared::cluster }
.src = { .shared::cta }
.completion_mechanism = { .mbarrier::complete_tx::bytes }
supports shared-to-shared copy. But in cute, make_tma_copy has to be global to shared. Why? Could you modify it?