FlagAttention icon indicating copy to clipboard operation
FlagAttention copied to clipboard

A collection of memory efficient attention operators implemented in the Triton language.

Results 4 FlagAttention issues
Sort by recently updated
recently updated
newest added

support grouped query attention(GQA) for flash_attn(related kernels: fwd, bwd, split_kv, total_attention) The GQA paper > Ainslie, Joshua, James Lee-Thorp, Michiel de Jong, Yury Zemlyanskiy, Federico Lebrón, and Sumit Sanghai. “GQA:...

Adds bias to attention. Many tests fail for me (that's why i'm adding draft PR), especially the BTHD and longer sequence ones (my GPU is 12Gb) but manual pytorch tests...

The pytorch base implementation of [`scaled_dot_product_attention`](https://pytorch.org/docs/stable/generated/torch.nn.functional.scaled_dot_product_attention.html#torch.nn.functional.scaled_dot_product_attention) provides dropout as an arg. Fusing it into the triton kernel would replicate that functionality, as dropout is applied to the attention scores, not...