rotary-embedding-torch
rotary-embedding-torch copied to clipboard
Length Extrapolatable Rotary Embeddings
Hi! I'm interested in using the rotary embeddings with x_pos=True
so my transformer is length-extrapolable. However, I noticed the readme mentions this technique works only with autoregressive transformers. Is there a reason why this wouldn't work with an encoder-only bidirectional transformer?
Thanks!