Qwen2.5-VL-32B-Instruct fp16 image recognition issue
Describe the bug using intelanalytics/ipex-llm-serving-xpu:0.8.3-b18 to serve Qwen2.5-VL-32B-Instruct, setting low bit to fp16 will not return image discription, while setting low bit to fp8 works fine.
How to reproduce Steps to reproduce the error: #/bin/bash export DOCKER_IMAGE=intelanalytics/ipex-llm-serving-xpu:0.8.3-b18 export CONTAINER_NAME=b18-test
sudo docker run -itd
--net=host
--device=/dev/dri
--privileged
-v /your_model_path:/llm/models/
--name=$CONTAINER_NAME
--shm-size="16g"
--entrypoint /bin/bash
$DOCKER_IMAGE
docker exec -it b18-test /bin/bash export MODEL_PATH="/llm/models/Qwen2.5-VL-32B-Instruct/" export SERVED_MODEL_NAME="Qwen2.5-VL-32B-Instruct" export TENSOR_PARALLEL_SIZE=8 bash start-vllm-service.sh
curl http://localhost:8000/v1/chat/completions
-H "Content-Type: application/json"
-d '{
"model": "Qwen2.5-VL-32B-Instruct",
"messages": [
{
"role": "user",
"content": [
{
"type": "text",
"text": "图片里有什么?"
},
{
"type": "image_url",
"image_url": {
"url": "http://farm6.staticflickr.com/5268/5602445367_3504763978_z.jpg"
}
}
]
}
],
"max_tokens": 128
}'
Screenshots
前面一张是fp8的,可以输出结果。后面那张是fp16的,不能输出识别结果。
Environment information 8*arc A770
Additional context Add any other context about the problem here.
Will be fixed by this pr