ocannl
ocannl copied to clipboard
Consider implementing Lean Attention (Flash Attention + softmax-as-reduce)
https://arxiv.org/abs/2405.10480