vllm [Bug]: loading qwen2-vl-7b fails with error: `assert "factor" in rope

Your current environment

The output of `python collect_env.py`

Versions of relevant libraries:
[pip3] flake8==6.0.0
[pip3] lion-pytorch==0.1.2
[pip3] mypy-extensions==1.0.0
[pip3] numpy==1.23.5
[pip3] nvidia-cublas-cu11==11.10.3.66
[pip3] nvidia-cublas-cu12==12.1.3.1
[pip3] nvidia-cuda-cupti-cu11==11.7.101
[pip3] nvidia-cuda-cupti-cu12==12.1.105
[pip3] nvidia-cuda-nvrtc-cu11==11.7.99
[pip3] nvidia-cuda-nvrtc-cu12==12.1.105
[pip3] nvidia-cuda-runtime-cu11==11.7.99
[pip3] nvidia-cuda-runtime-cu12==12.1.105
[pip3] nvidia-cudnn-cu11==8.5.0.96
[pip3] nvidia-cudnn-cu12==9.1.0.70
[pip3] nvidia-cufft-cu11==10.9.0.58
[pip3] nvidia-cufft-cu12==11.0.2.54
[pip3] nvidia-curand-cu11==10.2.10.91
[pip3] nvidia-curand-cu12==10.3.2.106
[pip3] nvidia-cusolver-cu11==11.4.0.1
[pip3] nvidia-cusolver-cu12==11.4.5.107
[pip3] nvidia-cusparse-cu11==11.7.4.91
[pip3] nvidia-cusparse-cu12==12.1.0.106
[pip3] nvidia-ml-py==12.555.43
[pip3] nvidia-nccl-cu11==2.14.3
[pip3] nvidia-nccl-cu12==2.20.5
[pip3] nvidia-nvjitlink-cu12==12.3.52
[pip3] nvidia-nvtx-cu11==11.7.91
[pip3] nvidia-nvtx-cu12==12.1.105
[pip3] pynvml==11.5.0
[pip3] pyzmq==25.1.0
[pip3] sentence-transformers==2.2.2
[pip3] torch==2.4.0
[pip3] torchvision==0.19.0
[pip3] transformers==4.45.0.dev0
[pip3] transformers-stream-generator==0.0.4
[pip3] triton==3.0.0
[pip3] vllm-nccl-cu12==2.18.1.0.3.0
[conda] Could not collect
ROCM Version: Could not collect
Neuron SDK Version: N/A
vLLM Version: 0.6.1@3fd2b0d21cd9ec78de410fdf8aa1de840e9ad77a
vLLM Build Flags

🐛 Describe the bug

Traceback (most recent call last):
  File "/home/anton/personal/transformer-experiments/inference/vllm_multi.py", line 21, in <module>
    run_server(args)
  File "/home/anton/personal/transformer-experiments/inference/vllm_multi.py", line 9, in run_server
    llm = load_model(args.model, 8192, args.gpu)
  File "/home/anton/personal/transformer-experiments/inference/model.py", line 19, in load_model
    engine = AsyncLLMEngine.from_engine_args(AsyncEngineArgs(
  File "/home/anton/personal/transformer-experiments/env/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 726, in from_engine_args
    engine_config = engine_args.create_engine_config()
  File "/home/anton/personal/transformer-experiments/env/lib/python3.10/site-packages/vllm/engine/arg_utils.py", line 844, in create_engine_config
    model_config = self.create_model_config()
  File "/home/anton/personal/transformer-experiments/env/lib/python3.10/site-packages/vllm/engine/arg_utils.py", line 782, in create_model_config
    return ModelConfig(
  File "/home/anton/personal/transformer-experiments/env/lib/python3.10/site-packages/vllm/config.py", line 227, in __init__
    self.max_model_len = _get_and_verify_max_len(
  File "/home/anton/personal/transformer-experiments/env/lib/python3.10/site-packages/vllm/config.py", line 1739, in _get_and_verify_max_len
    assert "factor" in rope_scaling

The recent qwen2-vl merge added a check for rope_type -> if rope_type == "mrope" : https://github.com/vllm-project/vllm/commit/3b7fea770f44369d077e40010bb4983ff3641535#diff-7eaad0b7dee0626bf29d10081b0f0c5e3ea15a4af97e7b182a4e0d35f8346953R1736

But huggingface is overriding this key to be set to "default" for some reason:

            if self.rope_scaling["type"] == "mrope":
                self.rope_scaling["type"] = "default"

https://github.com/huggingface/transformers/blob/main/src/transformers/models/qwen2_vl/configuration_qwen2_vl.py#L240

Do you know what is correct way to load model?

Before submitting a new issue...

[X] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

Sep 12 '24 00:09 abacaj

The specific issue is :

rope_scaling["type"] - key is being overridden to "default" even if it is initially set to "mrope".

Try :

if self.rope_scaling["type"] != "mrope": self.rope_scaling["type"] = "default"

This way, the original value of "mrope" will be preserved, allowing the model to open correctly.

Sep 12 '24 01:09 SHRISH01

The specific issue is :

rope_scaling["type"] - key is being overridden to "default" even if it is initially set to "mrope".

Try :

if self.rope_scaling["type"] != "mrope": self.rope_scaling["type"] = "default"

This way, the original value of "mrope" will be preserved, allowing the model to open correctly.

Uh is this an AI reply? Because the solution doesn't make sense...

Sep 12 '24 01:09 abacaj

Which version of transformers are you using? It is a known bug in transformers so you need to use the specific version (not just any dev version) as mentioned in our docs.

Sep 12 '24 02:09 DarkLight1337

Which version of transformers are you using? It is a known bug in transformers so you need to use the specific version (not just any dev version) as mentioned in our docs.

Got it yea now I see it was a recent change to transformers (using main branch), thanks!

Sep 12 '24 02:09 abacaj

Is this specific version the only one working still? I've tried newer versions of vllm with newer versions of transformers and am seeing this error.

Feb 15 '25 19:02 thisiskofi

The latest versions of vLLM/transformers should work together. I suggest you re-download the model repo from HF Hub as well to get the latest version.

Feb 16 '25 03:02 DarkLight1337

Confirming this works, thanks!

Feb 17 '25 04:02 thisiskofi

Is it solved? It remains for me

Oct 09 '25 12:10 bruce2233

[Bug]: loading qwen2-vl-7b fails with error: `assert "factor" in rope_scaling`

Your current environment

🐛 Describe the bug

Before submitting a new issue...