Isotr0py comments

Results 139 comments of


                                            Isotr0py

[Core] Support loading GGUF model

@AlpinDale I agree that we can directly port the [Aphrodite Engine](https://github.com/PygmalionAI/aphrodite-engine)'s dequantization kernels to vLLM. But I think we can also keep the transformers integration dequantization for CPU backend until...

[Core] Support loading GGUF model

@AlpinDale I think we can discuss this further on discord. How I can communicate with you on discord?

[Core] Support loading GGUF model

Currently, install `gguf` from pypi will only get `gguf=0.6.0` which is an old version months ago. However, to use imatrix quantization, it requires newest version which need to install from...

[Core] Support loading GGUF model

Nice! I will check it out and add test for qwen2 and imatrix!

[Core] Support loading GGUF model

@mgoin Could you please take a look at this once again? The way to handle `get_quant_method` for vocal embedding is confusing me. Could you give some suggestions about this? Thanks!

[Core] Support loading GGUF model

The tensor parallelism hasn't worked yet, because we haven't considered the distributed situation with `tp_size` and `tp_rank` when modifying `weight_loader` for gguf quantization. I will try to fix the tensor...

[Core] Support loading GGUF model

OK, I have added a check to raise exception for `tp_size>1` when initialize `GGUFConfig`.

[Core] Support loading GGUF model

@vbiral Thanks for reporting! Seems that the `gguf_to_hf_name_map` didn't handle `rope_freqs` correctly. I will have a look and fix it.

[Misc] Add Phi4-MM example

I think we should also add audio example for phi-4-mm, since it supports audio inputs as well.

[Bug]: InternVL3 poor (random) output with 8bit quantization

I haven't had machine to test 38B model yet. Can you check if smaller models like 8B/14B also have this issue?