rotary-embedding-torch issues

LieRE: Generalizing Rotary Position Encodings. Beats RoPE-mixed by large margin and is much faster (compute-wise)

3

Hi, @lucidrains ! There was a promising research published this month (vs. RoPE-mixed (#25) in March), the so-called LieRE positional encodings generalize the kv-vector rotation to any numbers of dimension...

kabachuha

RoPE-Mixed: Improvement over Axial for n-D

1

Hi @lucidrains, These folks talk about improving axial-RoPE performance. Some comparisons to axial-RoPE look nice, but for some, I am not convinced. I wanted to get your thoughts on this....

tasansal

RoPE embeddings

1

My conclusions about changing the positional encoding are that NOPE and ALiBi do not work well for only-encoders because, compared to only-decoders, they do not understand position at all (they...

PRamoneda

Slower than absolute positional embeddings?

4

Hi @lucidrains, Thanks for creating this wonderful package as well as `x-transformers`. I wanted to understand why rotary embeddings seem to be slower for me than absolute positional embeddings. I'm...

umarbutler

Explicit casting instead of autocasting

`torch.compile` doesn't play nicely with amp autocasting and occasionally there are issues when exporting to onnx or other formats. Would explicit fasting to float and back be preferrable. This appears...

lminer

Model hangs on eval

18

Hi! I'm running an enc-dec transformer with ROPE in the first self-attention layer of the encoder and decoder. I'm noticing that in the eval stage of my model, it hangs...

GarrettMerz

Fine-tuning Axial RoPE with frequency scaling?

Hi @lucidrains We have trained a 3D ViT masked autoencoder using axial RoPE for an image size of 512x512x512 (3D scientific images, sampled from much larger volumes). Now I want...

tasansal

Request for YaRN

@lucidrains, it would be really helpful to have an implementation of YaRN [(Peng _et al._)](https://openreview.net/forum?id=wHBfxhZu1u) in this repository as well.

VarunGumma

rotary-embedding-torch
rotary-embedding-torch copied to clipboard

Metadata

LieRE: Generalizing Rotary Position Encodings. Beats RoPE-mixed by large margin and is much faster (compute-wise)

RoPE-Mixed: Improvement over Axial for n-D

RoPE embeddings

Slower than absolute positional embeddings?

Explicit casting instead of autocasting

Model hangs on eval

Fine-tuning Axial RoPE with frequency scaling?

Request for YaRN

← Metadata

Owner

Metadata

rotary-embedding-torch rotary-embedding-torch copied to clipboard

Metadata

← Metadata

Owner

Metadata

rotary-embedding-torch
rotary-embedding-torch copied to clipboard