flash-attention
flash-attention copied to clipboard
Dose support kv cache is fp8 or int8 , but calculate is also fp16
trafficstars
Dose support kv cache is fp8 or int8 , but calculate is also fp16?read kvcashe by int8 is more fast by fp16, then in shaerd memory will convert int8 to fp16 and calculate.
Not yet. PRs are welcome.