FlagAttention
FlagAttention copied to clipboard
support grouped query attention(GQA) for flash_attn
support grouped query attention(GQA) for flash_attn(related kernels: fwd, bwd, split_kv, total_attention)
The GQA paper
Ainslie, Joshua, James Lee-Thorp, Michiel de Jong, Yury Zemlyanskiy, Federico Lebrón, and Sumit Sanghai. “GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints.” arXiv, December 23, 2023.
Mind the layout of the heads in the query.