text-generation-inference icon indicating copy to clipboard operation
text-generation-inference copied to clipboard

RuntimeError: Cannot load 'awq' weight when running Qwen2-VL-72B-Instruct-AWQ model

Open edesalve opened this issue 10 months ago • 1 comments
trafficstars

System Info

Hi all,

I encountered an issue when trying to run the Qwen/Qwen2-VL-72B-Instruct-AWQ model using the latest text-generation-inference Docker container (same issue with 3.0.1). The error message is as follows:

RuntimeError: Cannot load `awq` weight, make sure the model is already quantized.

Here is the command I used to start the container:

docker run -d --runtime nvidia --gpus '"device=2"' --shm-size 1g -p 8080:80 ghcr.io/huggingface/text-generation-inference:latest --model-id Qwen/Qwen2-VL-72B-Instruct-AWQ

I noticed a related issue (#2036), which seems to describe the same problem and it is marked as closed (#2233). However, it appears that the problem persists.

Information

  • [x] Docker
  • [ ] The CLI directly

Tasks

  • [x] An officially supported command
  • [ ] My own modifications

Reproduction

docker pull ghcr.io/huggingface/text-generation-inference:latest
docker run -d --runtime nvidia --gpus '"device=2"' --shm-size 1g -p 8080:80 ghcr.io/huggingface/text-generation-inference:latest --model-id Qwen/Qwen2-VL-72B-Instruct-AWQ

Expected behavior

The container should successfully start, and the model should load without errors.

edesalve avatar Jan 23 '25 09:01 edesalve