candle icon indicating copy to clipboard operation
candle copied to clipboard

Extract RotaryEmbedding code for reuse across models.

Open janimo opened this issue 1 year ago • 1 comments
trafficstars

Most models use identical of almost identical copies of RotaryEmbedding (cfg.rope_theta vs hardcoded 10000, rope_theta being f32 or f64, chunk() vs 2 calls to narrow() ). A few others (mixformer, phi, chatglm) are a bit more different implementations.

I did not change Yi to use cfg.rope_theta, it had hardcoded 10.000 while the config has 5.000.000 and I can not test this larger model.

janimo avatar Mar 19 '24 13:03 janimo

Would this make sense elsewhere, like transformers/utils ?

janimo avatar Apr 12 '24 08:04 janimo