vllm icon indicating copy to clipboard operation
vllm copied to clipboard

[Feature]: Support Gemma3 GGUF

Open hackey opened this issue 8 months ago • 10 comments

🚀 The feature, motivation and pitch

Need support Gemma3 GGUF

I also tried to try Gema 3 GGUF (https://huggingface.co/bartowski/google_gemma-3-27b-it-GGUF). An hour ago I downloaded the latest vllm code, built everything from sources. Including the latest version of transformers: pip install git+https://github.com/huggingface/[email protected] Here is the error when starting:

File "/home/hackey/miniconda3/envs/python312/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap self.run() File "/home/hackey/miniconda3/envs/python312/lib/python3.12/multiprocessing/process.py", line 108, in run self._target(*self._args, **self._kwargs) File "/home/hackey/AI/vllm/vllm/engine/multiprocessing/engine.py", line 413, in run_mp_engine raise e File "/home/hackey/AI/vllm/vllm/engine/multiprocessing/engine.py", line 402, in run_mp_engine engine = MQLLMEngine.from_engine_args(engine_args=engine_args, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/hackey/AI/vllm/vllm/engine/multiprocessing/engine.py", line 120, in from_engine_args engine_config = engine_args.create_engine_config(usage_context) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/hackey/AI/vllm/vllm/engine/arg_utils.py", line 1204, in create_engine_config model_config = self.create_model_config() ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/hackey/AI/vllm/vllm/engine/arg_utils.py", line 1130, in create_model_config return ModelConfig( ^^^^^^^^^^^^ File "/home/hackey/AI/vllm/vllm/config.py", line 327, in init hf_config = get_config(self.hf_config_path or self.model, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/hackey/AI/vllm/vllm/transformers_utils/config.py", line 280, in get_config config_dict, _ = PretrainedConfig.get_config_dict( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/hackey/AI/vllm/venv/lib/python3.12/site-packages/transformers/configuration_utils.py", line 594, in get_config_dict config_dict, kwargs = cls._get_config_dict(pretrained_model_name_or_path, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/hackey/AI/vllm/venv/lib/python3.12/site-packages/transformers/configuration_utils.py", line 685, in _get_config_dict config_dict = load_gguf_checkpoint(resolved_config_file, return_tensors=False)["config"] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/hackey/AI/vllm/venv/lib/python3.12/site-packages/transformers/modeling_gguf_pytorch_utils.py", line 399, in load_gguf_checkpoint raise ValueError(f"GGUF model with architecture {architecture} is not supported yet.") ValueError: GGUF model with architecture gemma3 is not supported yet.```

Alternatives

No response

Additional context

https://github.com/vllm-project/vllm/issues/14723

Before submitting a new issue...

  • [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

hackey avatar Mar 13 '25 10:03 hackey

Be me, see gemma come out. People say it's coal. Screw it, I'll try it. wot backend? no exllama, llama.cpp has no pictchas, hey what about vllm? It supports image models and GGUF!? Tensor Paralell go brrrr here I come to chat with memes Build VLLM from source GGUF model with architecture gemma3 is not supported yet. Wait a few hours and see a commit saying "Gemma 3 support" Excitedly build vLLM again. GGUF model with architecture gemma3 is not supported yet.

Ph0rk0z avatar Mar 13 '25 14:03 Ph0rk0z

Be me, see gemma come out. People say it's coal. Screw it, I'll try it. wot backend? no exllama, llama.cpp has no pictchas, hey what about vllm? It supports image models and GGUF!? Tensor Paralell go brrrr here I come to chat with memes Build VLLM from source GGUF model with architecture gemma3 is not supported yet. Wait a few hours and see a commit saying "Gemma 3 support" Excitedly build vLLM again. GGUF model with architecture gemma3 is not supported yet.

Humorist! If you are attentive and observant enough, you will see that there was a request from another VLLM developer, who asked to create a new ticket to request new functionality))

hackey avatar Mar 13 '25 15:03 hackey

Title should be quantized support So it will include AWQ version too

molavy2003 avatar Mar 14 '25 02:03 molavy2003

+1 to this feature request

diegoasua avatar Mar 26 '25 02:03 diegoasua

anyone got solution for this?

FancyCodeMaster avatar Mar 26 '25 08:03 FancyCodeMaster

The PR is in the pipeline. Just merge it and compile. I didn't see anyone saying it doesn't work. Wonder if the visual portion is accounted for.

Ph0rk0z avatar Mar 26 '25 13:03 Ph0rk0z

+1 to this feature request

bebilli avatar Mar 29 '25 19:03 bebilli

+1 to this feature request

nvsthinh avatar Mar 30 '25 10:03 nvsthinh

+1 to this feature request

tranthanhbinh1 avatar Apr 06 '25 08:04 tranthanhbinh1

+1 to this feature request

DTK-QI avatar Apr 12 '25 06:04 DTK-QI

+1

dkkb avatar Apr 13 '25 06:04 dkkb

+1

JohnConnor123 avatar Apr 13 '25 15:04 JohnConnor123

+1

MTDickens avatar Apr 14 '25 13:04 MTDickens

+1

PhungVanHoa avatar Apr 15 '25 10:04 PhungVanHoa

Guys, maybe you should stop writing empty comments? It would be better to like the first message, as is customary in gitHub communities. After all, for every comment with "+1" all participants receive a notification. This is unlikely to speed up development.

hackey avatar Apr 15 '25 10:04 hackey

The OP has this at the last line

File "/home/hackey/AI/vllm/venv/lib/python3.12/site-packages/transformers/modeling_gguf_pytorch_utils.py", line 399, in load_gguf_checkpoint
raise ValueError(f"GGUF model with architecture {architecture} is not supported yet.")
ValueError: GGUF model with architecture gemma3 is not supported yet.```

I think this is more an issue on transformers from huggingface.

surak avatar Apr 22 '25 11:04 surak

The OP has this at the last line

File "/home/hackey/AI/vllm/venv/lib/python3.12/site-packages/transformers/modeling_gguf_pytorch_utils.py", line 399, in load_gguf_checkpoint raise ValueError(f"GGUF model with architecture {architecture} is not supported yet.") ValueError: GGUF model with architecture gemma3 is not supported yet.``` I think this is more an issue on transformers from huggingface.

Indeed, transformers lib doesn't support GGUF inference. Only "loading models stored in the GGUF format for further training or finetuning" is available. See transformers: GGUF.

MTDickens avatar Apr 22 '25 13:04 MTDickens

What is the solution? Does it mean that GGUF format could not be infer with VLLM?

rh920 avatar May 21 '25 09:05 rh920

any update :(

iEddie-cmd avatar Jun 07 '25 21:06 iEddie-cmd

This issue has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this issue should remain open. Thank you!

github-actions[bot] avatar Sep 06 '25 02:09 github-actions[bot]

This issue has been automatically closed due to inactivity. Please feel free to reopen if you feel it is still relevant. Thank you!

github-actions[bot] avatar Oct 06 '25 02:10 github-actions[bot]