composable_kernel mha dosen't support mfma_f32

Hello, I try to change mfma_f32_32x32x8f16 instruction to mfma_f32_16x16x16f16 instruction in grouped_multihead_attention_forward_v2.cpp, but I get wrong result. Is there anything else need to be modified besides the GEMM parameter?

Dec 02 '23 04:12 hengyeliu

Different mfma instructions have various register input/output layouts. You can refer this: https://github.com/RadeonOpenCompute/amd_matrix_instruction_calculator

Dec 04 '23 17:12 zjing14

Different mfma instructions have various register input/output layouts. You can refer this: https://github.com/RadeonOpenCompute/amd_matrix_instruction_calculator

Hi，You mean I should modify the internal code of MHA in CK if I use different mfma instruction？or can you just provide a simple MHA example which used mfma_f32_16x16x16f16?

Dec 05 '23 10:12 hengyeliu

@hengyeliu You may refer to our mha gemm: https://github.com/ROCm/composable_kernel/blob/84832fc42d71e446fa2ddbf88b96fc2c05b21b49/include/ck/tensor_operation/gpu/warp/xdlops_gemm.hpp#L202

You need to transfer the output of mha for writing out to global memory.

Jan 30 '24 19:01 zjing14

@hengyeliu Has your issue been resolved? If so, please close the ticket. Thanks!

Aug 21 '24 17:08 ppanchad-amd

@hengyeliu Closing ticket. Please feel free to leave a comment if you need further assistance and we'll re-open the ticket. Thanks!

Sep 25 '24 17:09 ppanchad-amd

mha dosen't support mfma_f32_16x16x16f16 instruction