metaseq
metaseq copied to clipboard
Integrate LucidRain's RotaryEmbeddings
See https://github.com/lucidrains/rotary-embedding-torch/blob/main/rotary_embedding_torch/rotary_embedding_torch.py
And from PaLM paper:
We use RoPE embeddings (Su et al., 2021) rather than absolute or relative position embeddings, since RoPE embeddings have been shown to have better performance on long sequence lengths.
Since you already have xformers as a soft dependency, you should be able to pull it in directly from there if it's installed (similar to flash attention).
It would be nice to have more configurability via: https://github.com/lucidrains/rotary-embedding-torch/blob/6868f6ff30898989e4aa5890973911b2edc5e8d8/rotary_embedding_torch/rotary_embedding_torch.py#L60-L72
vs
https://github.com/facebookresearch/xformers/blob/bc08bbc631348913a3c37b4e09832973ff93a398/xformers/components/positional_embedding/rotary.py#L49-L57