ring-flash-attention
ring-flash-attention copied to clipboard
精度问题
There are some arithmetic errors with the current implementation. The reason for them is probably that flash attention will return bf16 value for each block, so we cannot accumluate the values with the original fp32 ones.
如果使用bf16精度,不是fp32精度,就不存在accumluate the values with the original fp32 ones.
?