llama.cpp
llama.cpp copied to clipboard
Add GGML_HIP_ROCWMMA_FATTN to enable rocWMMA for FlashAttention
- Add a new option
GGML_HIP_ROCWMMA_FATTN
and defaults to OFF - Check for rocWMMA header availability when
GGML_HIP_ROCWMMA_FATTN
is enabled - Define
FP16_MMA_AVAILABLE
whenGGML_HIP_ROCWMMA_FATTN
is enabled and target is supported by rocWMMA (CDNA / RDNA3) - Use rocWMMA in FlashAttention kernel when possible
Related issue: https://github.com/ggml-org/llama.cpp/issues/10439