cutlass
cutlass copied to clipboard
[QST][CuteDSL] warp mma support
I have noticed that CuteDSL only supports fp16/bf16 warp mma with shape m16n8k16 and m16n8k8 now.
https://github.com/NVIDIA/cutlass/blob/ec8daf642d69fc31352ac6fa6e14a0de9019604b/python/CuTeDSL/cutlass/cute/nvgpu/warp/mma.py
Are there any other support plans in the future, such as
- Turing support
- Volta support (with mma shape
m8n8k4) - B1 / INT4 / INT8 / FP4 / FP6 / FP8 / TF32 support
BTW, the documentation comments for MmaF16BF16Op seem to be incorrect. My understanding is that this is not tcgen05, right?
https://github.com/NVIDIA/cutlass/blob/ec8daf642d69fc31352ac6fa6e14a0de9019604b/python/CuTeDSL/cutlass/cute/nvgpu/warp/mma.py#L43-L50