flash-attention icon indicating copy to clipboard operation
flash-attention copied to clipboard

Dose support kv cache is fp8 or int8 , but calculate is also fp16

Open KnightYao opened this issue 1 year ago • 1 comments
trafficstars

Dose support kv cache is fp8 or int8 , but calculate is also fp16?read kvcashe by int8 is more fast by fp16, then in shaerd memory will convert int8 to fp16 and calculate.

KnightYao avatar Jun 26 '24 09:06 KnightYao

Not yet. PRs are welcome.

tridao avatar Jun 26 '24 17:06 tridao