annotated_deep_learning_paper_implementations icon indicating copy to clipboard operation
annotated_deep_learning_paper_implementations copied to clipboard

Question about RoPE code

Open rangehow opened this issue 1 year ago • 2 comments

I found here exist a difference in rope implementation mostly on permutation. Does this difference not affect the final result ? I'm not quite sure what I'm thinking. Sincerely ask for your advice : )

Paper version should be: image

version in this repo: image

rangehow avatar May 11 '24 09:05 rangehow

The ordering is different. So it wont affect training from scratch but you cant load a model trained with different ordering.

vpj avatar May 20 '24 08:05 vpj

Thanks for your answer : ) Is there exist some reason that the latter implementation was widely used in code instead former one ?

rangehow avatar May 20 '24 08:05 rangehow

It's easier to code

vpj avatar Jun 20 '24 07:06 vpj