Isotr0py

https://isotr0py.github.io/ [email protected]

Sun Yat-Sen University Guangdong, China 🐳 Master's degree candidate.

Results 139 comments of


                                            Isotr0py

[Bug]: broken gguf for phi4 with all vllm versions

Can you try to add `--tokenizer=microsoft/Phi-4-mini-instruct` when serving the model? I suspect the Phi-4 tokenizer conversion is broken at transformers side.

[Bug]: broken gguf for phi4 with all vllm versions

Can you try this command for serving? I can generate reasonable outputs on main branch with q4 checkpoint in [microsoft/phi-4-gguf](https://huggingface.co/microsoft/phi-4-gguf): ``` vllm serve /tmp/phi-4-q4.gguf --max-model-len 4096 --dtype half --tokenizer microsoft/phi-4...

[Bug]: broken gguf for phi4 with all vllm versions

I also tried [Q6_K](https://huggingface.co/unsloth/phi-4-GGUF/blob/main/phi-4-Q6_K.gguf) but still can't reproduce the CUDA index error, can you provide the information that which Q6 checkpoint are you using? ``` vllm serve /tmp/phi-4-Q6_K.gguf --max-model-len 4096...

[Misc] Reduce LoRA-related static variable

There were some issues in HF's opt repo yesterday, which should have been fixed. I think re-run these CIs should be just fine?

[Model] Add Support for Ovis1.6-Gemma2-9B Model

Please address pre-commit linting errors as well.

[Model] Add Support for Ovis1.6-Gemma2-9B Model

> Somehow the token is not handled properly during the profiling phase of vLLM. Can you point me into the right direction how is multimodal processing done in vLLM? Because...

Could we support Fuyu-8B, a multimodel llm?

I would like to work on this model. But it seems that the `persimmon` used as language model in `Fuyu-8B` hasn't been supported. Maybe we can support it first.

Provided example for loading GGUF model is not working [Bug]:

We haven't supported `gguf` quantization on cpu backend yet. You can try to install vllm with GPU backend.

Provided example for loading GGUF model is not working [Bug]:

The released 0.5.4 version hasn't included the gguf supoport. You can build from source code or install the latest nightly wheel: ```bash export VLLM_VERSION=0.5.4 # vLLM's main branch version is...

[Core] Support loading GGUF model

@AlpinDale Thanks! I'm glad to push this forward by adding quants kernels! I'm not familiar with the quantization in `ggml` and it's difficult for me to implement the `mmq`/`mmvq` ops....

‹
1
2
3
4
5
6
7
8
9
10
...
13
14
›