flash-linear-attention Quick question: Is there a non-causal optimized form of Flash Linear Attention?

Quick question: Is there a non-causal optimized form of Flash Linear Attention?

Open yzeng58 opened this issue 1 year ago • 6 comments

Great work!

It appears that both GLA and RetNet are optimized only for causal cases. Is there an optimized linear attention for non-causal scenarios?

Jul 08 '24 22:07 yzeng58