An error occurred: The checkpoint you are trying to load has model type qwen2_vl but Transformers does not recognize this architecture. This could be because of an issue with the checkpoint, or because your version of Transformers is out of date.

Open kicks66 opened this issue 1 year ago • 1 comments

Im getting the following error when using the vLLM template

I believe its because the latest version of transformers is required:

pip install git+https://github.com/huggingface/transformers accelerate

Is it possible to install this over the top?

Sep 23 '24 11:09 kicks66

try using the qwenllm/qwenvl:latest container image and a docker cmd similar to this:

python -m vllm.entrypoints.openai.api_server  --served-model-name Qwen2-VL-72B-Instruct-GPTQ-Int4 --model Qwen/Qwen2-VL-72B-Instruct-GPTQ-Int4 --dtype float16 --gpu-memory-utilization 0.8 --tensor-parallel-size 2 --trust-remote-code --max-model-len 8192 --limit-mm-per-prompt image=5,video=1

the above works for me when i create pods (each worker has 2 x A40) . serverless endpoints don't work, sadly

Oct 15 '24 06:10 cris-almodovar