FlagAttention support grouped query attention(GQA) for flash

support grouped query attention(GQA) for flash_attn

Open iclementine opened this issue 9 months ago • 0 comments

support grouped query attention(GQA) for flash_attn(related kernels: fwd, bwd, split_kv, total_attention)

The GQA paper

Ainslie, Joshua, James Lee-Thorp, Michiel de Jong, Yury Zemlyanskiy, Federico Lebrón, and Sumit Sanghai. “GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints.” arXiv, December 23, 2023.

Mind the layout of the heads in the query.

May 16 '24 02:05 iclementine

FlagAttention FlagAttention copied to clipboard

support grouped query attention(GQA) for flash_attn

FlagAttention
FlagAttention copied to clipboard