Isotr0py comments

Results 139 comments of


                                            Isotr0py

[Bug]: InternVL3 poor (random) output with 8bit quantization

It's because the qunatized models missed `"tie_word_embeddings": false` field in `llm_config`, you can add it manually in the model's `config.json`: https://huggingface.co/OpenGVLab/InternVL3-38B/blob/main/config.json#L93

[Bug]: InternVL3 poor (random) output with 8bit quantization

> Is that something that I should dig into from an llm-compressor side, or is it just expected that quantized models may need some config fields copied over from the...

[Bug]: InternVL3 poor (random) output with 8bit quantization

> transformers doesn't serialize config values that are equal to the default values > I'm wondering how do we end up with different values for tie_word_embeddings in vLLM and in...

[Bug]: InternVL3 poor (random) output with 8bit quantization

> Maybe we can do with AutoConfig and reassign values for text config? Sounds great! We have had to keep lots of modifed custom configs in `vllm/transformers_utils/configs` for similar reasons,...

[Core][VLM] Support image embeddings as input

@ywang96 Ok, I will decouple them tonight. (Sorry that I don't have bandwidth at daytime)

[Bug]: InternVL3 image dynamic preprocess issue

Seems setting `temperature=0` can generate reasonable grounding outputs: ``` vllm serve OpenGVLab/InternVL3-2B --max-model-len 4096 ``` ``` $ python examples/online_serving/openai_chat_completion_client_for_multimodal.py INFO 06-13 17:23:23 [__init__.py:244] Automatically detected platform cuda. Chat completion output...

Isotr0py

[Bug]: InternVL3 poor (random) output with 8bit quantization

[Bug]: InternVL3 poor (random) output with 8bit quantization

[Bug]: InternVL3 poor (random) output with 8bit quantization

[Bug]: InternVL3 poor (random) output with 8bit quantization

[Core][VLM] Support image embeddings as input

[Bug]: InternVL3 image dynamic preprocess issue

[Bug]: InternVL3 image dynamic preprocess issue

[Usage]: Does DeepSeek-R1 1.58-bit Dynamic Quant work on VLLM?

add random_seed to sample_hf_requests in benchmark_serving script

🚨 Support dequantization for most GGML types