flash-attention icon indicating copy to clipboard operation
flash-attention copied to clipboard

Does flash attention support FP8?

Open Godlovecui opened this issue 1 year ago • 4 comments
trafficstars

FP8 is very useful in training or inference in LLM. Does flash attention support FP8? Thank you~

Godlovecui avatar Jun 11 '24 08:06 Godlovecui

Not yet for now

tridao avatar Jun 11 '24 19:06 tridao

Not yet for now

Hi, @tridao Do you think what is the key difficult in support FP8? Or, do you have any schedule for support in the future? Thank you~

Godlovecui avatar Jun 12 '24 05:06 Godlovecui

The key difficulty is someone needs to implement it :D

tridao avatar Jun 12 '24 05:06 tridao

The key difficulty is someone needs to implement it :D

does support kv cache is fp8 or int8 , but calculate is also fp16

KnightYao avatar Jun 26 '24 09:06 KnightYao