flash-linear-attention
flash-linear-attention copied to clipboard
Quick question: Is there a non-causal optimized form of Flash Linear Attention?
Great work!
It appears that both GLA and RetNet are optimized only for causal cases. Is there an optimized linear attention for non-causal scenarios?