annotated_deep_learning_paper_implementations
annotated_deep_learning_paper_implementations copied to clipboard
Question about RoPE code
I found here exist a difference in rope implementation mostly on permutation. Does this difference not affect the final result ? I'm not quite sure what I'm thinking. Sincerely ask for your advice : )
Paper version should be:
version in this repo:
The ordering is different. So it wont affect training from scratch but you cant load a model trained with different ordering.
Thanks for your answer : ) Is there exist some reason that the latter implementation was widely used in code instead former one ?
It's easier to code