text-generation-inference
text-generation-inference copied to clipboard
RuntimeError: Cannot load 'awq' weight when running Qwen2-VL-72B-Instruct-AWQ model
trafficstars
System Info
Hi all,
I encountered an issue when trying to run the Qwen/Qwen2-VL-72B-Instruct-AWQ model using the latest text-generation-inference Docker container (same issue with 3.0.1). The error message is as follows:
RuntimeError: Cannot load `awq` weight, make sure the model is already quantized.
Here is the command I used to start the container:
docker run -d --runtime nvidia --gpus '"device=2"' --shm-size 1g -p 8080:80 ghcr.io/huggingface/text-generation-inference:latest --model-id Qwen/Qwen2-VL-72B-Instruct-AWQ
I noticed a related issue (#2036), which seems to describe the same problem and it is marked as closed (#2233). However, it appears that the problem persists.
Information
- [x] Docker
- [ ] The CLI directly
Tasks
- [x] An officially supported command
- [ ] My own modifications
Reproduction
docker pull ghcr.io/huggingface/text-generation-inference:latest
docker run -d --runtime nvidia --gpus '"device=2"' --shm-size 1g -p 8080:80 ghcr.io/huggingface/text-generation-inference:latest --model-id Qwen/Qwen2-VL-72B-Instruct-AWQ
Expected behavior
The container should successfully start, and the model should load without errors.