4grass
4grass
My code: ``` using GmemTiledCopyL = decltype(make_tiled_copy( Copy_Atom{}, Layout{}, Layout{})); using SmemLayoutL = decltype(Layout{}); __shared__ cute::array_aligned l; GmemTiledCopyL gmem_tiled_copy_L; auto gmem_thr_copy_LD = gmem_tiled_copy_LD.get_thread_slice(tid); Tensor _L = make_tensor(make_gmem_ptr(reinterpret_cast(L), make_shape(C), make_stride(Int{})); Tensor...
I need to fix blockM and blockN to ensure batch invariance. Can the CUTLASS gemm interface control this? using GemmKernel = cutlass::gemm::kernel::GemmUniversal< cute::Shape, CollectiveMainloop, CollectiveEpilogue>; thanks in advance!