Mega-pytorch icon indicating copy to clipboard operation
Mega-pytorch copied to clipboard

Implementation of Mega, the Single-head Attention with Multi-headed EMA architecture that currently holds SOTA on Long Range Arena

Results 1 Mega-pytorch issues
Sort by recently updated
recently updated
newest added

1. For https://arxiv.org/pdf/2209.10655.pdf#page=21 , why use `x = sqrt(2)` specifically ? why is it not easier to just use `x = 1` ? ![image](https://user-images.githubusercontent.com/3324659/192152144-21ebc6af-f898-4312-b939-f51f1c6916d5.png) 2. In https://arxiv.org/pdf/2109.08668.pdf#page=5 , I do...