text-generation-inference RuntimeError: Cannot load 'awq' weight when running Qwen2-VL-72B-Instruct-AWQ model

RuntimeError: Cannot load 'awq' weight when running Qwen2-VL-72B-Instruct-AWQ model

Open edesalve opened this issue 10 months ago • 1 comments

trafficstars

System Info

Hi all,

I encountered an issue when trying to run the Qwen/Qwen2-VL-72B-Instruct-AWQ model using the latest text-generation-inference Docker container (same issue with 3.0.1). The error message is as follows:

RuntimeError: Cannot load `awq` weight, make sure the model is already quantized.

Here is the command I used to start the container:

docker run -d --runtime nvidia --gpus '"device=2"' --shm-size 1g -p 8080:80 ghcr.io/huggingface/text-generation-inference:latest --model-id Qwen/Qwen2-VL-72B-Instruct-AWQ

I noticed a related issue (#2036), which seems to describe the same problem and it is marked as closed (#2233). However, it appears that the problem persists.

Information

[x] Docker
[ ] The CLI directly

Tasks

[x] An officially supported command
[ ] My own modifications

Reproduction

docker pull ghcr.io/huggingface/text-generation-inference:latest
docker run -d --runtime nvidia --gpus '"device=2"' --shm-size 1g -p 8080:80 ghcr.io/huggingface/text-generation-inference:latest --model-id Qwen/Qwen2-VL-72B-Instruct-AWQ

Expected behavior

The container should successfully start, and the model should load without errors.

Jan 23 '25 09:01 edesalve

text-generation-inference text-generation-inference copied to clipboard

RuntimeError: Cannot load 'awq' weight when running Qwen2-VL-72B-Instruct-AWQ model

System Info

Information

Tasks

Reproduction

Expected behavior

text-generation-inference
text-generation-inference copied to clipboard