flash-attention
flash-attention copied to clipboard
Does flash attention support FP8?
trafficstars
FP8 is very useful in training or inference in LLM. Does flash attention support FP8? Thank you~
Not yet for now
Not yet for now
Hi, @tridao Do you think what is the key difficult in support FP8? Or, do you have any schedule for support in the future? Thank you~
The key difficulty is someone needs to implement it :D
The key difficulty is someone needs to implement it :D
does support kv cache is fp8 or int8 , but calculate is also fp16