sglang [Bug] Unrecognized keys in `rope_scaling` for 'rope_type'='yarn': {'original_max_position

Checklist

[x] 1. I have searched related issues but cannot get the expected help.
[x] 2. The bug has not been fixed in the latest version.
[x] 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.
[x] 4. If the issue you raised is not a bug but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed.
[x] 5. Please use English, otherwise it will be closed.

Describe the bug

I am using Qwen2.5-72B which suppots positional extrapolation by Yarn through adding config(copied from https://huggingface.co/Qwen/Qwen2.5-72B-Instruct):

{
  "rope_scaling": {
    "factor": 4.0,
    "original_max_position_embeddings": 32768,
    "type": "yarn"
  }
}

However, this seems not supported by sglang, when I specify context length=128000, I got

ValueError: User-specified context_length (128000) is greater than the derived context_length (32768). This may lead to incorrect model outputs or CUDA errors. Note that the derived context_length may differ from max_position_embeddings in the model's config. To allow overriding this maximum, set the env var SGLANG_ALLOW_OVERWRITE_LONGER_CONTEXT_LEN=1

I am not sure what if SGLANG_ALLOW_OVERWRITE_LONGER_CONTEXT_LEN=1 worked as normal as yarn do. Besides I also get:

Unrecognized keys in `rope_scaling` for 'rope_type'='yarn': {'original_max_position_embeddings'}

Needed some help，this config works well in vllm

Reproduction

.

Environment

Python: 3.12.3 | packaged by conda-forge | (main, Apr 15 2024, 18:38:13) [GCC 12.3.0] CUDA available: True GPU 0,1,2,3: NVIDIA A100-SXM4-80GB GPU 0,1,2,3 Compute Capability: 8.0 CUDA_HOME: /usr/local/cuda NVCC: Cuda compilation tools, release 12.1, V12.1.105 CUDA Driver Version: 470.103.01 PyTorch: 2.5.1+cu121 sglang: 0.4.1.post5 flashinfer: 0.2.0.post1+cu121torch2.4 triton: 3.1.0 transformers: 4.47.1 torchao: 0.7.0 numpy: 1.26.4 aiohttp: 3.11.9 fastapi: 0.115.6 hf_transfer: Module Not Found huggingface_hub: 0.26.3 interegular: 0.3.3 modelscope: Module Not Found orjson: 3.10.12 packaging: 24.0 psutil: 6.1.0 pydantic: 2.10.3 multipart: 0.0.20 zmq: 26.2.0 uvicorn: 0.32.1 uvloop: 0.21.0 vllm: 0.6.4.post1 xgrammar: Module Not Found openai: 1.56.2 anthropic: Module Not Found litellm: Module Not Found decord: Module Not Found NVIDIA Topology: GPU0 GPU1 GPU2 GPU3 mlx5_0 mlx5_1 mlx5_2 mlx5_3 mlx5_4 mlx5_5 mlx5_6 mlx5_7 CPU Affinity NUMA Affinity GPU0 X NV12 NV12 NV12 SYS SYS SYS SYS PXB PXB NODE NODE 48-95,144-191 1 GPU1 NV12 X NV12 NV12 SYS SYS SYS SYS PXB PXB NODE NODE 48-95,144-191 1 GPU2 NV12 NV12 X NV12 SYS SYS SYS SYS NODE NODE PXB PXB 48-95,144-191 1 GPU3 NV12 NV12 NV12 X SYS SYS SYS SYS NODE NODE PXB PXB 48-95,144-191 1 mlx5_0 SYS SYS SYS SYS X PIX NODE NODE SYS SYS SYS SYS mlx5_1 SYS SYS SYS SYS PIX X NODE NODE SYS SYS SYS SYS mlx5_2 SYS SYS SYS SYS NODE NODE X PIX SYS SYS SYS SYS mlx5_3 SYS SYS SYS SYS NODE NODE PIX X SYS SYS SYS SYS mlx5_4 PXB PXB NODE NODE SYS SYS SYS SYS X PIX NODE NODE mlx5_5 PXB PXB NODE NODE SYS SYS SYS SYS PIX X NODE NODE mlx5_6 NODE NODE PXB PXB SYS SYS SYS SYS NODE NODE X PIX mlx5_7 NODE NODE PXB PXB SYS SYS SYS SYS NODE NODE PIX X

Legend:

X = Self SYS = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI) NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node PHB = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU) PXB = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge) PIX = Connection traversing at most a single PCIe bridge NV# = Connection traversing a bonded set of # NVLinks

ulimit soft: 1048576

Jan 17 '25 12:01 rangehow

@zhyncs Who is on the long context part?

Jan 21 '25 19:01 zhaochenyang20

Seems related to this PR https://github.com/sgl-project/sglang/pull/757 scaling_factor is set to 1 and lead to wrong context_length

Jan 24 '25 09:01 zhengy001

BTW, i also saw "Unrecognized keys in rope_scaling for 'rope_type'='yarn': {'original_max_position_embeddings'}" in vllm. Seems unrelated.

Jan 24 '25 09:01 zhengy001

Thanks. We will check and see.

Jan 24 '25 17:01 zhaochenyang20

has anyone found a fix for this? I am seeing this too.

Jan 24 '25 19:01 ANarayan

I've searched for help.

Jan 25 '25 00:01 zhaochenyang20

Seems removing these lines can fix it:

if "original_max_position_embeddings" in rope_scaling:
    rope_scaling_factor = 1

Feb 13 '25 23:02 leng-yue

@rangehow Could you try this?

if "original_max_position_embeddings" in rope_scaling:
    rope_scaling_factor = 1

Also, I raise help from Qwen Team.

Feb 14 '25 00:02 zhaochenyang20

Unrecognized keys in rope_scaling for 'rope_type'='yarn': {'original_max_position_embeddings'}

I’ve only encountered this prompt, and there are no errors reported. Therefore, can this prompt be ignored?

Feb 20 '25 09:02 okLLM

I will ask Qwen for this.

Feb 20 '25 19:02 zhaochenyang20

Changing key from type to rope_type matches to what transformers expect and it removes warning. But does this confirms that the YaRN is implemented and working?

Apr 09 '25 00:04 rjmehta1993

It seems to be fixed.

Previously, transformers would raise warnings about unrecognized keys despite this being a valid configuration parameter

https://github.com/huggingface/transformers/pull/36877

Apr 22 '25 04:04 curiscold

[Bug] Unrecognized keys in `rope_scaling` for 'rope_type'='yarn': {'original_max_position_embeddings'}

Checklist

Describe the bug

Reproduction

Environment