Michael Goin comments

Results 271 comments of


                                            Michael Goin

[Core] Support loading GGUF model

@Isotr0py They fixed it! Please try the new `gguf==0.9.1` release https://pypi.org/project/gguf/

[Core] Support loading GGUF model

@Isotr0py I started looking at this and saw your issue. I decided it would be best for quantize methods to just allow for running as if they are linear methods...

[Core] Support loading GGUF model

@Isotr0py I think you can work on TP in another PR, not required to land. But we should have a specific check + exception in the quantization config, rather than...

[Core] Support loading GGUF model

Thanks for the nice work @Isotr0py and @AlpinDale - let's keep it up with improvements!

[Core] Support loading GGUF model

@inuwamobarak @kalebeasilvadev This model works fine from my testing just now. I am able to spin up a vLLM server: ```bash wget https://huggingface.co/TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF/resolve/main/tinyllama-1.1b-chat-v1.0.Q2_K.gguf vllm serve tinyllama-1.1b-chat-v1.0.Q2_K.gguf --tokenizer TinyLlama/TinyLlama-1.1B-Chat-v1.0 ``` Are...

[Kernel] (1/N) Machete - Hopper Optimized Mixed Precision Linear Kernel

@congcongchen123 I think your issue is related to this PR https://github.com/vllm-project/vllm/pull/7730, would you mind giving it a try to see if it resolves your issue?

[Kernel] (1/N) Machete - Hopper Optimized Mixed Precision Linear Kernel

It looks like you are building with CUDA 11.8, which definitely makes sense as an issue for this kernel. I believe we should just not build the kernel in this...

[Bugfix][Frontend] Strip empty tool calls from incoming chat conversations

The failing entrypoints test seems related ``` [2025-04-11T15:16:59Z] FAILED entrypoints/test_chat_utils.py::test_tool_calls_empty_does_not_throw[tool_chat_template_granite_20b_fc.jinja] - TypeError: apply_hf_chat_template() missing 1 required positional argument: 'tools' -- | [2025-04-11T15:16:59Z] FAILED entrypoints/test_chat_utils.py::test_tool_calls_empty_does_not_throw[tool_chat_template_hermes.jinja] - TypeError: apply_hf_chat_template() missing 1 required...

[Model] Support Solar Model

Could you describe the major differences of this architecture against llama? From a glance looking at the config and modeling code, it seems quite similar

[Model] Support Solar Model

These seem like reasonable differences then. Could you fix the formatting first? It should just be `pip install -r requirements-lint.txt` and then `./format.sh`