flash-attention Inquiry about Turing GPU support progress in FlashAttention-2

Dear FlashAttention Development Team,

First off, thank you so much for developing such an excellent library like FlashAttention!

I noticed in the project's README (https://github.com/Dao-AILab/flash-attention) that FlashAttention-2 currently supports Ampere, Ada, or Hopper GPUs. The README also mentions that "Support for Turing GPUs (T4, RTX 2080) is coming soon, please use FlashAttention 1.x for Turing GPUs for now."

I am currently still using a Turing architecture GPU (specifically an RTX 2080 Ti) and I'm very keen to leverage the new features and performance benefits of FlashAttention-2 on these cards as soon as possible. Being able to use the V2 version is quite important for my work.

Would it be possible for you to share any information regarding the estimated timeline for when FlashAttention-2 might officially support Turing architecture GPUs? Or perhaps if there's any development plan or roadmap related to this that you can share?

Any update or information on this progress would be extremely helpful to me.

Looking forward to your reply. Thank you again for your time and effort on this great project!

Apr 22 '25 08:04 wangsen-zy

We're not actively working on Turing but there's a version here: https://github.com/Dao-AILab/flash-attention/issues/1533

Apr 22 '25 13:04 tridao

Hi Team, Do you have any tentative release date for this feature?

Sep 03 '25 01:09 Jyothirmaikottu

No we're not working on Turing.

Sep 03 '25 01:09 tridao