inference 申请支持支持Qwen2.5-VL-72B-Instruct-AWQ模型，VLLM推理

System Info / 系統信息

8*24G的4090服务器

Running Xinference with Docker? / 是否使用 Docker 运行 Xinfernece？

[ ] docker / docker
[x] pip install / 通过 pip install 安装
[ ] installation from source / 从源码安装

Version info / 版本信息

xinferenc=1.2.2

The command used to start Xinference / 用以启动 xinference 的命令

nohup env XINFERENCE_HOME=/home/xinference xinference-local --host 0.0.0.0 --port 9997 > /home/logs/xinference.log 2>&1 &

Reproduction / 复现过程

模型：https://huggingface.co/PointerHQ/Qwen2.5-VL-72B-Instruct-Pointer-AWQ Model Format：awq quantization：Int4 推理引擎：VLLM

Expected behavior / 期待表现

现在4卡启动报错： Server error: 400 - [address=0.0.0.0:34175, pid=190427] Weight input_size_per_partition = 7392 is not divisible by min_thread_k = 128. Consider reducing tensor_parallel_size or running with --quantization gptq.

模型是AWQ格式的模型，但是报错是上面的信息，期待表现就是awq格式的模型Qwen2.5-VL-72B-Instruct-AWQ可以正常VLLM部署

Feb 13 '25 08:02 moshilangzi

pointer 是什么意思？不知道这个是不是正常的 AWQ 格式量化的模型，因此没有加到默认支持里。

Feb 13 '25 08:02 qinxuye

指针是什么意思？不知道这不是正常的AWQ格式量化的相应模型，没有加到默认支持里。

秦总，这是一家公司，它发布的这个awq，有木有别的可以部署的awq呢～

Feb 13 '25 10:02 moshilangzi

pointer 是什么意思？不知道这个是不是正常的 AWQ 格式量化的模型，因此没有加到默认支持里。

https://huggingface.co/Benasd/Qwen2.5-VL-72B-Instruct-AWQ 这个模型能不能给支持一下～VLLM推理

Feb 14 '25 03:02 moshilangzi

官方2025.2.25日发布了awq量化版本的模型了：https://www.modelscope.cn/models/Qwen/Qwen2.5-VL-7B-Instruct-AWQ；https://www.modelscope.cn/models/Qwen/Qwen2.5-VL-72B-Instruct-AWQ

Feb 26 '25 08:02 Pinealeye

你好，我已经收到你的邮件。

May 22 '25 11:05 Pinealeye