flash-attention icon indicating copy to clipboard operation
flash-attention copied to clipboard

Turing GPU support

Open sumanthnallamotu opened this issue 1 year ago • 5 comments

In reference to the following on the main page:

"FlashAttention-2 currently supports: Ampere, Ada, or Hopper GPUs (e.g., A100, RTX 3090, RTX 4090, H100). Support for Turing GPUs (T4, RTX 2080) is coming soon, please use FlashAttention 1.x for Turing GPUs for now. Datatype fp16 and bf16 (bf16 requires Ampere, Ada, or Hopper GPUs). All head dimensions up to 256. Head dim > 192 backward requires A100/A800 or H100/H800."

How soon can we expect support for Turing GPU's? Some models I'd like to use are based on Mistral, which requires FlashAttention v2.

Specifically, I'm looking for support with T4.

Thank you!

sumanthnallamotu avatar Dec 12 '23 16:12 sumanthnallamotu

Me too!

WingsLong avatar Dec 19 '23 11:12 WingsLong

Me too!

IvoryTower800 avatar Dec 28 '23 03:12 IvoryTower800

Me too!

online2311 avatar Dec 31 '23 04:12 online2311

me too, too LoL

laoda513 avatar Jan 24 '24 03:01 laoda513

Duplicate of #542

mirh avatar Aug 06 '24 10:08 mirh

Me too!

zbh2047 avatar Nov 09 '24 04:11 zbh2047

And me too :+1:

giopaglia avatar Nov 09 '24 16:11 giopaglia

Any progress?

vigourwu avatar Dec 10 '24 06:12 vigourwu

me too

kaiyuanlee avatar Jun 05 '25 15:06 kaiyuanlee