vllm
vllm copied to clipboard
[Feature]: Support Gemma3 GGUF
🚀 The feature, motivation and pitch
Need support Gemma3 GGUF
I also tried to try Gema 3 GGUF (https://huggingface.co/bartowski/google_gemma-3-27b-it-GGUF). An hour ago I downloaded the latest vllm code, built everything from sources. Including the latest version of transformers: pip install git+https://github.com/huggingface/[email protected] Here is the error when starting:
File "/home/hackey/miniconda3/envs/python312/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap self.run() File "/home/hackey/miniconda3/envs/python312/lib/python3.12/multiprocessing/process.py", line 108, in run self._target(*self._args, **self._kwargs) File "/home/hackey/AI/vllm/vllm/engine/multiprocessing/engine.py", line 413, in run_mp_engine raise e File "/home/hackey/AI/vllm/vllm/engine/multiprocessing/engine.py", line 402, in run_mp_engine engine = MQLLMEngine.from_engine_args(engine_args=engine_args, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/hackey/AI/vllm/vllm/engine/multiprocessing/engine.py", line 120, in from_engine_args engine_config = engine_args.create_engine_config(usage_context) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/hackey/AI/vllm/vllm/engine/arg_utils.py", line 1204, in create_engine_config model_config = self.create_model_config() ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/hackey/AI/vllm/vllm/engine/arg_utils.py", line 1130, in create_model_config return ModelConfig( ^^^^^^^^^^^^ File "/home/hackey/AI/vllm/vllm/config.py", line 327, in init hf_config = get_config(self.hf_config_path or self.model, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/hackey/AI/vllm/vllm/transformers_utils/config.py", line 280, in get_config config_dict, _ = PretrainedConfig.get_config_dict( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/hackey/AI/vllm/venv/lib/python3.12/site-packages/transformers/configuration_utils.py", line 594, in get_config_dict config_dict, kwargs = cls._get_config_dict(pretrained_model_name_or_path, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/hackey/AI/vllm/venv/lib/python3.12/site-packages/transformers/configuration_utils.py", line 685, in _get_config_dict config_dict = load_gguf_checkpoint(resolved_config_file, return_tensors=False)["config"] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/hackey/AI/vllm/venv/lib/python3.12/site-packages/transformers/modeling_gguf_pytorch_utils.py", line 399, in load_gguf_checkpoint raise ValueError(f"GGUF model with architecture {architecture} is not supported yet.") ValueError: GGUF model with architecture gemma3 is not supported yet.```
Alternatives
No response
Additional context
https://github.com/vllm-project/vllm/issues/14723
Before submitting a new issue...
- [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
Be me, see gemma come out. People say it's coal. Screw it, I'll try it. wot backend? no exllama, llama.cpp has no pictchas, hey what about vllm? It supports image models and GGUF!? Tensor Paralell go brrrr here I come to chat with memes Build VLLM from source GGUF model with architecture gemma3 is not supported yet. Wait a few hours and see a commit saying "Gemma 3 support" Excitedly build vLLM again. GGUF model with architecture gemma3 is not supported yet.
Be me, see gemma come out. People say it's coal. Screw it, I'll try it. wot backend? no exllama, llama.cpp has no pictchas, hey what about vllm? It supports image models and GGUF!? Tensor Paralell go brrrr here I come to chat with memes Build VLLM from source GGUF model with architecture gemma3 is not supported yet. Wait a few hours and see a commit saying "Gemma 3 support" Excitedly build vLLM again. GGUF model with architecture gemma3 is not supported yet.
Humorist! If you are attentive and observant enough, you will see that there was a request from another VLLM developer, who asked to create a new ticket to request new functionality))
Title should be quantized support So it will include AWQ version too
+1 to this feature request
anyone got solution for this?
The PR is in the pipeline. Just merge it and compile. I didn't see anyone saying it doesn't work. Wonder if the visual portion is accounted for.
+1 to this feature request
+1 to this feature request
+1 to this feature request
+1 to this feature request
+1
+1
+1
+1
Guys, maybe you should stop writing empty comments? It would be better to like the first message, as is customary in gitHub communities. After all, for every comment with "+1" all participants receive a notification. This is unlikely to speed up development.
The OP has this at the last line
File "/home/hackey/AI/vllm/venv/lib/python3.12/site-packages/transformers/modeling_gguf_pytorch_utils.py", line 399, in load_gguf_checkpoint
raise ValueError(f"GGUF model with architecture {architecture} is not supported yet.")
ValueError: GGUF model with architecture gemma3 is not supported yet.```
I think this is more an issue on transformers from huggingface.
The OP has this at the last line
File "/home/hackey/AI/vllm/venv/lib/python3.12/site-packages/transformers/modeling_gguf_pytorch_utils.py", line 399, in load_gguf_checkpoint raise ValueError(f"GGUF model with architecture {architecture} is not supported yet.") ValueError: GGUF model with architecture gemma3 is not supported yet.``` I think this is more an issue on transformers from huggingface.
Indeed, transformers lib doesn't support GGUF inference. Only "loading models stored in the GGUF format for further training or finetuning" is available. See transformers: GGUF.
What is the solution? Does it mean that GGUF format could not be infer with VLLM?
any update :(
This issue has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this issue should remain open. Thank you!
This issue has been automatically closed due to inactivity. Please feel free to reopen if you feel it is still relevant. Thank you!