composable_kernel icon indicating copy to clipboard operation
composable_kernel copied to clipboard

mha dosen't support mfma_f32_16x16x16f16 instruction

Open hengyeliu opened this issue 2 years ago • 3 comments

Hello, I try to change mfma_f32_32x32x8f16 instruction to mfma_f32_16x16x16f16 instruction in grouped_multihead_attention_forward_v2.cpp, but I get wrong result. Is there anything else need to be modified besides the GEMM parameter?

hengyeliu avatar Dec 02 '23 04:12 hengyeliu

Different mfma instructions have various register input/output layouts. You can refer this: https://github.com/RadeonOpenCompute/amd_matrix_instruction_calculator

zjing14 avatar Dec 04 '23 17:12 zjing14

Different mfma instructions have various register input/output layouts. You can refer this: https://github.com/RadeonOpenCompute/amd_matrix_instruction_calculator

Hi,You mean I should modify the internal code of MHA in CK if I use different mfma instruction?or can you just provide a simple MHA example which used mfma_f32_16x16x16f16?

hengyeliu avatar Dec 05 '23 10:12 hengyeliu

@hengyeliu You may refer to our mha gemm: https://github.com/ROCm/composable_kernel/blob/84832fc42d71e446fa2ddbf88b96fc2c05b21b49/include/ck/tensor_operation/gpu/warp/xdlops_gemm.hpp#L202

You need to transfer the output of mha for writing out to global memory.

zjing14 avatar Jan 30 '24 19:01 zjing14

@hengyeliu Has your issue been resolved? If so, please close the ticket. Thanks!

ppanchad-amd avatar Aug 21 '24 17:08 ppanchad-amd

@hengyeliu Closing ticket. Please feel free to leave a comment if you need further assistance and we'll re-open the ticket. Thanks!

ppanchad-amd avatar Sep 25 '24 17:09 ppanchad-amd