metaseq icon indicating copy to clipboard operation
metaseq copied to clipboard

Integrate LucidRain's RotaryEmbeddings

Open suchenzang opened this issue 2 years ago • 2 comments

See https://github.com/lucidrains/rotary-embedding-torch/blob/main/rotary_embedding_torch/rotary_embedding_torch.py

And from PaLM paper:

We use RoPE embeddings (Su et al., 2021) rather than absolute or relative position embeddings, since RoPE embeddings have been shown to have better performance on long sequence lengths.

suchenzang avatar Jan 27 '23 07:01 suchenzang

Since you already have xformers as a soft dependency, you should be able to pull it in directly from there if it's installed (similar to flash attention).

erip avatar Jan 27 '23 19:01 erip

It would be nice to have more configurability via: https://github.com/lucidrains/rotary-embedding-torch/blob/6868f6ff30898989e4aa5890973911b2edc5e8d8/rotary_embedding_torch/rotary_embedding_torch.py#L60-L72

vs

https://github.com/facebookresearch/xformers/blob/bc08bbc631348913a3c37b4e09832973ff93a398/xformers/components/positional_embedding/rotary.py#L49-L57

suchenzang avatar Jan 27 '23 20:01 suchenzang