flash-attention
flash-attention copied to clipboard
flash decoding algorithm numerical error
In combine_attn_seqk_parallel
, didn't calulate the global maximum score m
and properly rescale O_i , so might have more numerical error than v1 and v2
@tridao
Can you give a short script showing the numerical error?