annotated_deep_learning_paper_implementations icon indicating copy to clipboard operation
annotated_deep_learning_paper_implementations copied to clipboard

question about RoPE code

Open yukyeongmin opened this issue 2 years ago • 3 comments

https://github.com/labmlai/annotated_deep_learning_paper_implementations/blob/f42c0e9cf49eedd003b34bc4e7a58e0157eae332/labml_nn/transformers/rope/init.py#L188

self.cos_cached and self.sin_cached have same shape of x, aren't they??

So if this line intended to compute RoPE with partial of x which means x[...,:self.d], i think this line should be x_rope = (x_rope * self.cos_cached[...,:self.d) + (neg_half_x * self.sin_cached[...,:self.d])

please let me know if i'm wrong

yukyeongmin avatar Nov 15 '23 13:11 yukyeongmin

You are correct that self.cos_cached and self.sin_cached have same shape of x.

And when it comes to the modication, that is also correct because it would ensure that the rotary embeddings are applied only to the subset of features specified by self.d

nagamonish avatar Nov 26 '23 01:11 nagamonish

They have the similar shapes. The truncation of cached sin/cos to x.shape[0] is truncating them to sequence length. Because the sequence lengths (number of tokens per sample) changes.

vpj avatar Nov 26 '23 10:11 vpj

Thanks for reply!! @vpj @nagamonish

Didn't you have any problems running that code? The original code didn't work for me with different shape of input. And i thought it's about grammar.

yukyeongmin avatar Nov 26 '23 11:11 yukyeongmin

Fixed the test code here https://github.com/labmlai/annotated_deep_learning_paper_implementations/commit/2236f6383ce66bb25f1880512a4ad0ec8f37514a

vpj avatar Jun 20 '24 07:06 vpj