vllm
vllm copied to clipboard
[Bug]: sm75 can not serve qwen3 bnb 4bit model
Your current environment
docker image v0.8.5
vllm-openai-1 | (VllmWorkerProcess pid=149) WARNING 04-28 18:00:58 [utils.py:168] The model class Qwen3MoeForCausalLM has not defined packed_modules_mapping, this may lead to incorrect mapping of quantized or ignored modules
vllm-openai-1 | WARNING 04-28 18:00:58 [utils.py:168] The model class Qwen3MoeForCausalLM has not defined packed_modules_mapping, this may lead to incorrect mapping of quantized or ignored modules
vllm-openai-1 | (VllmWorkerProcess pid=149) ERROR 04-28 18:00:58 [multiproc_worker_utils.py:238] Exception in worker VllmWorkerProcess while processing method load_model.
vllm-openai-1 | (VllmWorkerProcess pid=149) ERROR 04-28 18:00:58 [multiproc_worker_utils.py:238] Traceback (most recent call last):
vllm-openai-1 | (VllmWorkerProcess pid=149) ERROR 04-28 18:00:58 [multiproc_worker_utils.py:238] File "/usr/local/lib/python3.12/dist-packages/vllm/executor/multiproc_worker_utils.py", line 232, in _run_worker_process
vllm-openai-1 | (VllmWorkerProcess pid=149) ERROR 04-28 18:00:58 [multiproc_worker_utils.py:238] output = run_method(worker, method, args, kwargs)
vllm-openai-1 | (VllmWorkerProcess pid=149) ERROR 04-28 18:00:58 [multiproc_worker_utils.py:238] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
vllm-openai-1 | (VllmWorkerProcess pid=149) ERROR 04-28 18:00:58 [multiproc_worker_utils.py:238] File "/usr/local/lib/python3.12/dist-packages/vllm/utils.py", line 2456, in run_method
vllm-openai-1 | (VllmWorkerProcess pid=149) ERROR 04-28 18:00:58 [multiproc_worker_utils.py:238] return func(*args, **kwargs)
vllm-openai-1 | (VllmWorkerProcess pid=149) ERROR 04-28 18:00:58 [multiproc_worker_utils.py:238] ^^^^^^^^^^^^^^^^^^^^^
vllm-openai-1 | (VllmWorkerProcess pid=149) ERROR 04-28 18:00:58 [multiproc_worker_utils.py:238] File "/usr/local/lib/python3.12/dist-packages/vllm/worker/worker.py", line 203, in load_model
vllm-openai-1 | (VllmWorkerProcess pid=149) ERROR 04-28 18:00:58 [multiproc_worker_utils.py:238] self.model_runner.load_model()
vllm-openai-1 | (VllmWorkerProcess pid=149) ERROR 04-28 18:00:58 [multiproc_worker_utils.py:238] File "/usr/local/lib/python3.12/dist-packages/vllm/worker/model_runner.py", line 1111, in load_model
vllm-openai-1 | (VllmWorkerProcess pid=149) ERROR 04-28 18:00:58 [multiproc_worker_utils.py:238] self.model = get_model(vllm_config=self.vllm_config)
vllm-openai-1 | (VllmWorkerProcess pid=149) ERROR 04-28 18:00:58 [multiproc_worker_utils.py:238] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
vllm-openai-1 | (VllmWorkerProcess pid=149) ERROR 04-28 18:00:58 [multiproc_worker_utils.py:238] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/init.py", line 14, in get_model
vllm-openai-1 | (VllmWorkerProcess pid=149) ERROR 04-28 18:00:58 [multiproc_worker_utils.py:238] return loader.load_model(vllm_config=vllm_config)
🐛 Describe the bug
vllm-openai:
runtime: nvidia
restart: always
deploy:
resources:
reservations:
devices:
- driver: nvidia
device_ids: [ '2', '3']
capabilities: [ gpu ]
volumes:
- ~/.cache/huggingface:/root/.cache/huggingface
- /home/hucd/models:/models
environment:
- HUGGING_FACE_HUB_TOKEN=
Before submitting a new issue...
- [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
First, this model lacks packed_modules_mapping, and second, MOE does not support BNB, so it cannot be supported at the moment
Same issue with the GPTQ versions of Qwen3-30B-A3B.
also AWQ version not work
+1 AWQ
how do we use qwen 3 moe quantized?
+1 AWQ
+1 bnb
+1 AWQ
?
+1 gptq