cutlass
cutlass copied to clipboard
[QST]Inquiry About the Computation Size in a Single cute::gemm Call in CUTLASS
What is your question? Could you please explain how large a single cute::gemm computation is in CUTLASS? Since multiple threads compute together, and it doesn’t explicitly state the number of iterations like CUDA cores do, I find it a bit confusing.