annotated_deep_learning_paper_implementations Question about RoPE code

Question about RoPE code

Open rangehow opened this issue 1 year ago • 2 comments

I found here exist a difference in rope implementation mostly on permutation. Does this difference not affect the final result ？ I'm not quite sure what I'm thinking. Sincerely ask for your advice : )

Paper version should be:

version in this repo:

May 11 '24 09:05 rangehow

The ordering is different. So it wont affect training from scratch but you cant load a model trained with different ordering.

May 20 '24 08:05 vpj

Thanks for your answer : ) Is there exist some reason that the latter implementation was widely used in code instead former one ?

May 20 '24 08:05 rangehow

It's easier to code

Jun 20 '24 07:06 vpj

annotated_deep_learning_paper_implementations annotated_deep_learning_paper_implementations copied to clipboard

Question about RoPE code

annotated_deep_learning_paper_implementations
annotated_deep_learning_paper_implementations copied to clipboard