Easy-Transformer icon indicating copy to clipboard operation
Easy-Transformer copied to clipboard

[Proposal] Implement LongRoPE

Open YuhengHuang42 opened this issue 9 months ago • 0 comments

Proposal

Implement LongRoPE proposed by Microsoft

Motivation

LongRoPE is a key mechanism implemented in models such as microsoft/Phi-4-mini-instruct and microsoft/Phi-3.5-mini-instruct.

This mechanism requires unique initialization (https://github.com/huggingface/transformers/blob/af9b2eaa54c150741f298d6db939af6328e1dc38/src/transformers/modeling_rope_utils.py#L242) and update (https://huggingface.co/microsoft/Phi-4-mini-instruct/blob/d02e859785d6a6ee7cb2ed7913e32b7a0e8665b4/modeling_phi3.py#L339) which the current TransformerLens does not seem to support.

Without corresponding implementation, there would be inconsistencies between the inference results of the TransformerLens and the Huggingface model. This is also a silent error, making the results such as https://github.com/TransformerLensOrg/TransformerLens/issues/748 likely to be wrong.

Deliberately setting rope_scaling to None for the Huggingface model will make the inference results of these two implementations the same. This means after implementing LongRoPE, adding related models to the support list of TransformerLens will be very straightforward.

Pitch

The implementation of LongRoPE and the corresponding changes in HookedTransformerConfig to configure it.

Alternatives

Have no idea yet.

Checklist

  • [x] I have checked that there is no similar issue in the repo (required)

YuhengHuang42 avatar Mar 10 '25 17:03 YuhengHuang42