rectified-linear-attention
rectified-linear-attention copied to clipboard
Sparse Attention with Linear Units
Rectified Linear Attention
This repo contain pytorch implementation of Sparse Attention with Linear Units, this is not the official repo so some details might be vary from paper.
Citation:
@misc{zhang2021sparse,
title={Sparse Attention with Linear Units},
author={Biao Zhang and Ivan Titov and Rico Sennrich},
year={2021},
eprint={2104.07012},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
References:
- Transformer component and initial Attention code from lucidrain's vit-pytorch
- RMSNorm code is from this repo.