[Bug]: loading qwen2-vl-7b fails with error: `assert "factor" in rope_scaling`
Your current environment
The output of `python collect_env.py`
Versions of relevant libraries:
[pip3] flake8==6.0.0
[pip3] lion-pytorch==0.1.2
[pip3] mypy-extensions==1.0.0
[pip3] numpy==1.23.5
[pip3] nvidia-cublas-cu11==11.10.3.66
[pip3] nvidia-cublas-cu12==12.1.3.1
[pip3] nvidia-cuda-cupti-cu11==11.7.101
[pip3] nvidia-cuda-cupti-cu12==12.1.105
[pip3] nvidia-cuda-nvrtc-cu11==11.7.99
[pip3] nvidia-cuda-nvrtc-cu12==12.1.105
[pip3] nvidia-cuda-runtime-cu11==11.7.99
[pip3] nvidia-cuda-runtime-cu12==12.1.105
[pip3] nvidia-cudnn-cu11==8.5.0.96
[pip3] nvidia-cudnn-cu12==9.1.0.70
[pip3] nvidia-cufft-cu11==10.9.0.58
[pip3] nvidia-cufft-cu12==11.0.2.54
[pip3] nvidia-curand-cu11==10.2.10.91
[pip3] nvidia-curand-cu12==10.3.2.106
[pip3] nvidia-cusolver-cu11==11.4.0.1
[pip3] nvidia-cusolver-cu12==11.4.5.107
[pip3] nvidia-cusparse-cu11==11.7.4.91
[pip3] nvidia-cusparse-cu12==12.1.0.106
[pip3] nvidia-ml-py==12.555.43
[pip3] nvidia-nccl-cu11==2.14.3
[pip3] nvidia-nccl-cu12==2.20.5
[pip3] nvidia-nvjitlink-cu12==12.3.52
[pip3] nvidia-nvtx-cu11==11.7.91
[pip3] nvidia-nvtx-cu12==12.1.105
[pip3] pynvml==11.5.0
[pip3] pyzmq==25.1.0
[pip3] sentence-transformers==2.2.2
[pip3] torch==2.4.0
[pip3] torchvision==0.19.0
[pip3] transformers==4.45.0.dev0
[pip3] transformers-stream-generator==0.0.4
[pip3] triton==3.0.0
[pip3] vllm-nccl-cu12==2.18.1.0.3.0
[conda] Could not collect
ROCM Version: Could not collect
Neuron SDK Version: N/A
vLLM Version: 0.6.1@3fd2b0d21cd9ec78de410fdf8aa1de840e9ad77a
vLLM Build Flags
🐛 Describe the bug
Traceback (most recent call last):
File "/home/anton/personal/transformer-experiments/inference/vllm_multi.py", line 21, in <module>
run_server(args)
File "/home/anton/personal/transformer-experiments/inference/vllm_multi.py", line 9, in run_server
llm = load_model(args.model, 8192, args.gpu)
File "/home/anton/personal/transformer-experiments/inference/model.py", line 19, in load_model
engine = AsyncLLMEngine.from_engine_args(AsyncEngineArgs(
File "/home/anton/personal/transformer-experiments/env/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 726, in from_engine_args
engine_config = engine_args.create_engine_config()
File "/home/anton/personal/transformer-experiments/env/lib/python3.10/site-packages/vllm/engine/arg_utils.py", line 844, in create_engine_config
model_config = self.create_model_config()
File "/home/anton/personal/transformer-experiments/env/lib/python3.10/site-packages/vllm/engine/arg_utils.py", line 782, in create_model_config
return ModelConfig(
File "/home/anton/personal/transformer-experiments/env/lib/python3.10/site-packages/vllm/config.py", line 227, in __init__
self.max_model_len = _get_and_verify_max_len(
File "/home/anton/personal/transformer-experiments/env/lib/python3.10/site-packages/vllm/config.py", line 1739, in _get_and_verify_max_len
assert "factor" in rope_scaling
The recent qwen2-vl merge added a check for rope_type -> if rope_type == "mrope" : https://github.com/vllm-project/vllm/commit/3b7fea770f44369d077e40010bb4983ff3641535#diff-7eaad0b7dee0626bf29d10081b0f0c5e3ea15a4af97e7b182a4e0d35f8346953R1736
But huggingface is overriding this key to be set to "default" for some reason:
if self.rope_scaling["type"] == "mrope":
self.rope_scaling["type"] = "default"
https://github.com/huggingface/transformers/blob/main/src/transformers/models/qwen2_vl/configuration_qwen2_vl.py#L240
Do you know what is correct way to load model?
Before submitting a new issue...
- [X] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
The specific issue is :
rope_scaling["type"] - key is being overridden to "default" even if it is initially set to "mrope".
Try :
if self.rope_scaling["type"] != "mrope": self.rope_scaling["type"] = "default"
This way, the original value of "mrope" will be preserved, allowing the model to open correctly.
The specific issue is :
rope_scaling["type"] - key is being overridden to "default" even if it is initially set to "mrope".
Try :
if self.rope_scaling["type"] != "mrope": self.rope_scaling["type"] = "default"
This way, the original value of "mrope" will be preserved, allowing the model to open correctly.
Uh is this an AI reply? Because the solution doesn't make sense...
Which version of transformers are you using? It is a known bug in transformers so you need to use the specific version (not just any dev version) as mentioned in our docs.
Which version of
transformersare you using? It is a known bug intransformersso you need to use the specific version (not just any dev version) as mentioned in our docs.
Got it yea now I see it was a recent change to transformers (using main branch), thanks!
Is this specific version the only one working still? I've tried newer versions of vllm with newer versions of transformers and am seeing this error.
The latest versions of vLLM/transformers should work together. I suggest you re-download the model repo from HF Hub as well to get the latest version.
Confirming this works, thanks!
Is it solved? It remains for me