llama.cpp icon indicating copy to clipboard operation
llama.cpp copied to clipboard

Add GGML_HIP_ROCWMMA_FATTN to enable rocWMMA for FlashAttention

Open hjc4869 opened this issue 17 hours ago • 2 comments

  • Add a new option GGML_HIP_ROCWMMA_FATTN and defaults to OFF
  • Check for rocWMMA header availability when GGML_HIP_ROCWMMA_FATTN is enabled
  • Define FP16_MMA_AVAILABLE when GGML_HIP_ROCWMMA_FATTN is enabled and target is supported by rocWMMA (CDNA / RDNA3)
  • Use rocWMMA in FlashAttention kernel when possible

Related issue: https://github.com/ggml-org/llama.cpp/issues/10439

hjc4869 avatar Feb 22 '25 19:02 hjc4869