TransformerEngine
TransformerEngine copied to clipboard
[Feature Request] Grouped GEMM kernel
Thanks for the awesome library! I'm wondering whether there are plans to provide ops support for grouped_gemm as in https://github.com/tgale96/grouped_gemm/tree/main
As of more information, it seems that fp8 is supported in cutlass grouped_gemm.
https://github.com/NVIDIA/cutlass/blob/main/examples/57_hopper_grouped_gemm/57_hopper_grouped_gemm.cu#L94
A GroupedLinear layer has been added in TE v1.9, and it has FP8 support.