rotary-embedding-torch issues

about axial rotary embeddings

Hi, Thank you for sharing this code with us. However, I was confused with the axial rotary embeddings in rotary_embedding_torch.py file. " elif freqs_for == 'pixel': freqs = torch.linspace(1., max_freq...

raindrop313

`torch.cat` failes in `apply_rotary_emb` when `freqs.shape[-1] == t.shape[-1]`, and `start_index = 0`

1

Two issues into one, as they seemingly come from the same function. Right now, if I `torch.jit.trace` a module that uses `rotate_queries_or_keys()`, I hit the following `TracerWarning`: ``` TracerWarning: Converting...

mattaltberg

Usage with x-transformers

4

Is it possibly to easily use axial rotary embeddings with your x-transformers without having to disect the Attention module? At first glance it seems that there is no simple way...

sonovice

caching frequency results in RuntimeError: Trying to backward through the graph a second time

2

Hi, thank you very much for this handy rotary embedding library. I encountered this runtime error when the rotary embedding was trying to read cached frequency at the second `loss.backward()`...

wren93

freqs reference

Fixing reference to parameter

biirving

Length Extrapolatable Rotary Embeddings

2

Hi! I'm interested in using the rotary embeddings with `x_pos=True` so my transformer is length-extrapolable. However, I noticed the readme mentions this technique works only with autoregressive transformers. Is there...

hugofloresgarcia

why dim of q be different from dim of RotaryEmbedding

2

In your demo code, dim of q is 64 while dim of RotaryEmbedding is 32. I checked the code, q with position index larger than 32 will not be rotate...

HiSultryMan

Tricks for training with RoPE? Specific initialisers for QK projections?

Hi, I am debugging an issue with [my model](https://github.com/thorinf/simple-diffusion-lm) not learning longer contexts. It could be countless things, but I wanted to check if there are required tricks, or best...

thorinf

Repeat order.

Hello, Thank you for the amazing work! I had a brief question, shouldn't `(n r)` in repeat be `(r n)` [here](https://github.com/lucidrains/rotary-embedding-torch/blob/783d17820ac1e75e918ae2128ab8bbcbe4985362/rotary_embedding_torch/rotary_embedding_torch.py#L277). As (r n)!=(n r), as `(r n)` would be...

AliYoussef97

Lastest commit incompatible with local_attention

3

I've haven't investigate this but the latest commit makes the MeshGPT tests fail & users get the error below: ``` File /usr/local/lib/python3.10/dist-packages/local_attention/transformer.py:152, in LocalMHA.forward(self, x, mask, attn_bias, cache, return_cache) 149...

MarcusLoppe

rotary-embedding-torch
rotary-embedding-torch copied to clipboard

Metadata

about axial rotary embeddings

`torch.cat` failes in `apply_rotary_emb` when `freqs.shape[-1] == t.shape[-1]`, and `start_index = 0`

Usage with x-transformers

caching frequency results in RuntimeError: Trying to backward through the graph a second time

freqs reference

Length Extrapolatable Rotary Embeddings

why dim of q be different from dim of RotaryEmbedding

Tricks for training with RoPE? Specific initialisers for QK projections?

Repeat order.

Lastest commit incompatible with local_attention

← Metadata

Owner

Metadata

rotary-embedding-torch rotary-embedding-torch copied to clipboard

Metadata

← Metadata

Owner

Metadata

rotary-embedding-torch
rotary-embedding-torch copied to clipboard