[Proposal] Implement LongRoPE
Proposal
Implement LongRoPE proposed by Microsoft
Motivation
LongRoPE is a key mechanism implemented in models such as microsoft/Phi-4-mini-instruct and microsoft/Phi-3.5-mini-instruct.
This mechanism requires unique initialization (https://github.com/huggingface/transformers/blob/af9b2eaa54c150741f298d6db939af6328e1dc38/src/transformers/modeling_rope_utils.py#L242) and update (https://huggingface.co/microsoft/Phi-4-mini-instruct/blob/d02e859785d6a6ee7cb2ed7913e32b7a0e8665b4/modeling_phi3.py#L339) which the current TransformerLens does not seem to support.
Without corresponding implementation, there would be inconsistencies between the inference results of the TransformerLens and the Huggingface model. This is also a silent error, making the results such as https://github.com/TransformerLensOrg/TransformerLens/issues/748 likely to be wrong.
Deliberately setting rope_scaling to None for the Huggingface model will make the inference results of these two implementations the same. This means after implementing LongRoPE, adding related models to the support list of TransformerLens will be very straightforward.
Pitch
The implementation of LongRoPE and the corresponding changes in HookedTransformerConfig to configure it.
Alternatives
Have no idea yet.
Checklist
- [x] I have checked that there is no similar issue in the repo (required)