annotated_deep_learning_paper_implementations question about RoPE code

https://github.com/labmlai/annotated_deep_learning_paper_implementations/blob/f42c0e9cf49eedd003b34bc4e7a58e0157eae332/labml_nn/transformers/rope/init.py#L188

self.cos_cached and self.sin_cached have same shape of x, aren't they??

So if this line intended to compute RoPE with partial of x which means x[...,:self.d], i think this line should be x_rope = (x_rope * self.cos_cached[...,:self.d) + (neg_half_x * self.sin_cached[...,:self.d])

please let me know if i'm wrong

Nov 15 '23 13:11 yukyeongmin

You are correct that self.cos_cached and self.sin_cached have same shape of x.

And when it comes to the modication, that is also correct because it would ensure that the rotary embeddings are applied only to the subset of features specified by self.d

Nov 26 '23 01:11 nagamonish

They have the similar shapes. The truncation of cached sin/cos to x.shape[0] is truncating them to sequence length. Because the sequence lengths (number of tokens per sample) changes.

Nov 26 '23 10:11 vpj

Thanks for reply!! @vpj @nagamonish

Didn't you have any problems running that code? The original code didn't work for me with different shape of input. And i thought it's about grammar.

Nov 26 '23 11:11 yukyeongmin

Fixed the test code here https://github.com/labmlai/annotated_deep_learning_paper_implementations/commit/2236f6383ce66bb25f1880512a4ad0ec8f37514a

Jun 20 '24 07:06 vpj