worker-vllm icon indicating copy to clipboard operation
worker-vllm copied to clipboard

An error occurred: The checkpoint you are trying to load has model type qwen2_vl but Transformers does not recognize this architecture. This could be because of an issue with the checkpoint, or because your version of Transformers is out of date.

Open kicks66 opened this issue 1 year ago • 1 comments

Im getting the following error when using the vLLM template

An error occurred: The checkpoint you are trying to load has model type qwen2_vl but Transformers does not recognize this architecture. This could be because of an issue with the checkpoint, or because your version of Transformers is out of date.

I believe its because the latest version of transformers is required:

pip install git+https://github.com/huggingface/transformers accelerate

Is it possible to install this over the top?

kicks66 avatar Sep 23 '24 11:09 kicks66

try using the qwenllm/qwenvl:latest container image and a docker cmd similar to this:

python -m vllm.entrypoints.openai.api_server  --served-model-name Qwen2-VL-72B-Instruct-GPTQ-Int4 --model Qwen/Qwen2-VL-72B-Instruct-GPTQ-Int4 --dtype float16 --gpu-memory-utilization 0.8 --tensor-parallel-size 2 --trust-remote-code --max-model-len 8192 --limit-mm-per-prompt image=5,video=1

the above works for me when i create pods (each worker has 2 x A40) . serverless endpoints don't work, sadly

cris-almodovar avatar Oct 15 '24 06:10 cris-almodovar