vllm [Feature]: Support Gemma3 GGUF

🚀 The feature, motivation and pitch

Need support Gemma3 GGUF

I also tried to try Gema 3 GGUF (https://huggingface.co/bartowski/google_gemma-3-27b-it-GGUF). An hour ago I downloaded the latest vllm code, built everything from sources. Including the latest version of transformers: pip install git+https://github.com/huggingface/[email protected] Here is the error when starting:

File "/home/hackey/miniconda3/envs/python312/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap self.run() File "/home/hackey/miniconda3/envs/python312/lib/python3.12/multiprocessing/process.py", line 108, in run self._target(*self._args, **self._kwargs) File "/home/hackey/AI/vllm/vllm/engine/multiprocessing/engine.py", line 413, in run_mp_engine raise e File "/home/hackey/AI/vllm/vllm/engine/multiprocessing/engine.py", line 402, in run_mp_engine engine = MQLLMEngine.from_engine_args(engine_args=engine_args, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/hackey/AI/vllm/vllm/engine/multiprocessing/engine.py", line 120, in from_engine_args engine_config = engine_args.create_engine_config(usage_context) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/hackey/AI/vllm/vllm/engine/arg_utils.py", line 1204, in create_engine_config model_config = self.create_model_config() ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/hackey/AI/vllm/vllm/engine/arg_utils.py", line 1130, in create_model_config return ModelConfig( ^^^^^^^^^^^^ File "/home/hackey/AI/vllm/vllm/config.py", line 327, in init hf_config = get_config(self.hf_config_path or self.model, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/hackey/AI/vllm/vllm/transformers_utils/config.py", line 280, in get_config config_dict, _ = PretrainedConfig.get_config_dict( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/hackey/AI/vllm/venv/lib/python3.12/site-packages/transformers/configuration_utils.py", line 594, in get_config_dict config_dict, kwargs = cls._get_config_dict(pretrained_model_name_or_path, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/hackey/AI/vllm/venv/lib/python3.12/site-packages/transformers/configuration_utils.py", line 685, in _get_config_dict config_dict = load_gguf_checkpoint(resolved_config_file, return_tensors=False)["config"] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/hackey/AI/vllm/venv/lib/python3.12/site-packages/transformers/modeling_gguf_pytorch_utils.py", line 399, in load_gguf_checkpoint raise ValueError(f"GGUF model with architecture {architecture} is not supported yet.") ValueError: GGUF model with architecture gemma3 is not supported yet.```

Alternatives

No response

Additional context

https://github.com/vllm-project/vllm/issues/14723

Before submitting a new issue...

[x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

Mar 13 '25 10:03 hackey

Be me, see gemma come out. People say it's coal. Screw it, I'll try it. wot backend? no exllama, llama.cpp has no pictchas, hey what about vllm? It supports image models and GGUF!? Tensor Paralell go brrrr here I come to chat with memes Build VLLM from source GGUF model with architecture gemma3 is not supported yet. Wait a few hours and see a commit saying "Gemma 3 support" Excitedly build vLLM again. GGUF model with architecture gemma3 is not supported yet.

Mar 13 '25 14:03 Ph0rk0z

Be me, see gemma come out. People say it's coal. Screw it, I'll try it. wot backend? no exllama, llama.cpp has no pictchas, hey what about vllm? It supports image models and GGUF!? Tensor Paralell go brrrr here I come to chat with memes Build VLLM from source GGUF model with architecture gemma3 is not supported yet. Wait a few hours and see a commit saying "Gemma 3 support" Excitedly build vLLM again. GGUF model with architecture gemma3 is not supported yet.

Humorist! If you are attentive and observant enough, you will see that there was a request from another VLLM developer, who asked to create a new ticket to request new functionality))

Mar 13 '25 15:03 hackey

Title should be quantized support So it will include AWQ version too

Mar 14 '25 02:03 molavy2003

+1 to this feature request

Mar 26 '25 02:03 diegoasua

anyone got solution for this?

Mar 26 '25 08:03 FancyCodeMaster

The PR is in the pipeline. Just merge it and compile. I didn't see anyone saying it doesn't work. Wonder if the visual portion is accounted for.

Mar 26 '25 13:03 Ph0rk0z

+1 to this feature request

Mar 29 '25 19:03 bebilli

+1 to this feature request

Mar 30 '25 10:03 nvsthinh

+1 to this feature request

Apr 06 '25 08:04 tranthanhbinh1

+1 to this feature request

Apr 12 '25 06:04 DTK-QI

+1

Apr 13 '25 06:04 dkkb

+1

Apr 13 '25 15:04 JohnConnor123

+1

Apr 14 '25 13:04 MTDickens

+1

Apr 15 '25 10:04 PhungVanHoa

Guys, maybe you should stop writing empty comments? It would be better to like the first message, as is customary in gitHub communities. After all, for every comment with "+1" all participants receive a notification. This is unlikely to speed up development.

Apr 15 '25 10:04 hackey

The OP has this at the last line

File "/home/hackey/AI/vllm/venv/lib/python3.12/site-packages/transformers/modeling_gguf_pytorch_utils.py", line 399, in load_gguf_checkpoint
raise ValueError(f"GGUF model with architecture {architecture} is not supported yet.")
ValueError: GGUF model with architecture gemma3 is not supported yet.```

I think this is more an issue on transformers from huggingface.

Apr 22 '25 11:04 surak

The OP has this at the last line

File "/home/hackey/AI/vllm/venv/lib/python3.12/site-packages/transformers/modeling_gguf_pytorch_utils.py", line 399, in load_gguf_checkpoint raise ValueError(f"GGUF model with architecture {architecture} is not supported yet.") ValueError: GGUF model with architecture gemma3 is not supported yet.``` I think this is more an issue on transformers from huggingface.

Indeed, transformers lib doesn't support GGUF inference. Only "loading models stored in the GGUF format for further training or finetuning" is available. See transformers: GGUF.

Apr 22 '25 13:04 MTDickens

What is the solution? Does it mean that GGUF format could not be infer with VLLM?

May 21 '25 09:05 rh920

any update :(

Jun 07 '25 21:06 iEddie-cmd

This issue has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this issue should remain open. Thank you!

Sep 06 '25 02:09 github-actions[bot]

This issue has been automatically closed due to inactivity. Please feel free to reopen if you feel it is still relevant. Thank you!

Oct 06 '25 02:10 github-actions[bot]

vllm vllm copied to clipboard

[Feature]: Support Gemma3 GGUF

🚀 The feature, motivation and pitch

Alternatives

Additional context

Before submitting a new issue...

vllm
vllm copied to clipboard