Flash Attention V1 support
I noticed this repo: https://github.com/huggingface/candle-flash-attn-v1. Was curious if there is any plan on the roadmap to have a feature allowing flash-attn-v1 (rather than v2) in order to support a wider range of gpus.
Maybe @LaurentMazare ?
Bump on this if possible.
@LaurentMazare @EricLBuehler can you please advise per this?
Hey @Murad-Awad! We have candle-extensions now, and you can use the candle-flash-attn-v1 crate. The function is a 1:1 drop-in replacement for the v2 implementation here in Candle.
Let me know if you have any issues using this.