composable_kernel
composable_kernel copied to clipboard
[CK Tile] Need GroupGEMM with make_argument on device
Currently, CK Tile GroupGEMM prepares metadata on the host, which requires transferring meta data between the device and host back and forth. https://github.com/ROCm/composable_kernel/blob/6b6fcd370bb2e5572422a1ca71d261df02b6263e/include/ck_tile/ops/gemm/kernel/grouped_gemm_kernel.hpp#L98
To avoid this overhead, we need groupGEMM kernel as old CK, which is a persistent kernel that reads GEMM shapes from device memory, and calculate offset and block_id on-the-fly.
@aosewski Could you help? Move your groupGEMM design from old CK to CK Tile