Isotr0py

Results 139 comments of Isotr0py

@AlpinDale I agree that we can directly port the [Aphrodite Engine](https://github.com/PygmalionAI/aphrodite-engine)'s dequantization kernels to vLLM. But I think we can also keep the transformers integration dequantization for CPU backend until...

@AlpinDale I think we can discuss this further on discord. How I can communicate with you on discord?

Currently, install `gguf` from pypi will only get `gguf=0.6.0` which is an old version months ago. However, to use imatrix quantization, it requires newest version which need to install from...

Nice! I will check it out and add test for qwen2 and imatrix!

@mgoin Could you please take a look at this once again? The way to handle `get_quant_method` for vocal embedding is confusing me. Could you give some suggestions about this? Thanks!

The tensor parallelism hasn't worked yet, because we haven't considered the distributed situation with `tp_size` and `tp_rank` when modifying `weight_loader` for gguf quantization. I will try to fix the tensor...

OK, I have added a check to raise exception for `tp_size>1` when initialize `GGUFConfig`.

@vbiral Thanks for reporting! Seems that the `gguf_to_hf_name_map` didn't handle `rope_freqs` correctly. I will have a look and fix it.

I think we should also add audio example for phi-4-mm, since it supports audio inputs as well.

I haven't had machine to test 38B model yet. Can you check if smaller models like 8B/14B also have this issue?