cutlass
cutlass copied to clipboard
[QST]Why we have three GEMM in cutlass
What is your question? https://github.com/NVIDIA/cutlass/blob/f7b19de32c5d1f3cedfc735c2849f12b537522ee/include/cutlass/gemm/collective/sm90_mma_tma_gmma_ss_warpspecialized.hpp#L477-L554
I understand that parts 2 and 3 correspond to k_iter's 0 and [1, k_end), respectively. However, what is the purpose of part 1? Why does it iterate over k_block? (Based on testing, part 1 is indeed entered several times, and if part 1 is commented out, the result is incorrect.)