BS-RoFormer
BS-RoFormer copied to clipboard
Linear Attention
hello, I have two question about the Linear Attention that was added later. Can you clear me up why it is called Linear Attention when the referenced paper introduces Cross-Covariance Attention and why exactly is it better than Self-Attention?