flash-attention icon indicating copy to clipboard operation
flash-attention copied to clipboard

Type of gemm.

Open gaodaheng opened this issue 1 year ago • 3 comments

All gemm in flash attention (inlcude forward & backward), input is fp16/bf16 (include left matrax & right matrax), output is fp32?

gaodaheng avatar Dec 06 '23 02:12 gaodaheng

Yes that's right.

tridao avatar Dec 06 '23 06:12 tridao

if input is bf16,should output still be fp32 ? for example , when q,k is bf16,can q*kT output bf16 dtype@tridao

fate08301017 avatar Jul 10 '24 06:07 fate08301017

yes q@k^T is in fp32, softmax is done in fp32, then converted to bf16 to do the gemm with V.

tridao avatar Jul 10 '24 06:07 tridao