Qwen2.5-VL-32B-Instruct fp16 image recognition issue

Open wluo7 opened this issue 7 months ago • 1 comments

Describe the bug using intelanalytics/ipex-llm-serving-xpu:0.8.3-b18 to serve Qwen2.5-VL-32B-Instruct, setting low bit to fp16 will not return image discription, while setting low bit to fp8 works fine.

How to reproduce Steps to reproduce the error: #/bin/bash export DOCKER_IMAGE=intelanalytics/ipex-llm-serving-xpu:0.8.3-b18 export CONTAINER_NAME=b18-test

sudo docker run -itd
--net=host
--device=/dev/dri
--privileged
-v /your_model_path:/llm/models/
--name=$CONTAINER_NAME
--shm-size="16g"
--entrypoint /bin/bash
$DOCKER_IMAGE

docker exec -it b18-test /bin/bash export MODEL_PATH="/llm/models/Qwen2.5-VL-32B-Instruct/" export SERVED_MODEL_NAME="Qwen2.5-VL-32B-Instruct" export TENSOR_PARALLEL_SIZE=8 bash start-vllm-service.sh

curl http://localhost:8000/v1/chat/completions
-H "Content-Type: application/json"
-d '{ "model": "Qwen2.5-VL-32B-Instruct", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "图片里有什么?" }, { "type": "image_url", "image_url": { "url": "http://farm6.staticflickr.com/5268/5602445367_3504763978_z.jpg" } } ] } ], "max_tokens": 128 }' Screenshots 前面一张是fp8的，可以输出结果。后面那张是fp16的，不能输出识别结果。

Environment information 8*arc A770

Additional context Add any other context about the problem here.

May 20 '25 05:05 wluo7

Will be fixed by this pr

May 22 '25 07:05 hzjane