LugerW-A comments

Results 10 comments of


                                            LugerW-A

Assertion failed: Invalid tensor name: decoder_input_lengths

Did you solve it?

[Performance]: Added request take too much time, and the model will not run untill all the request are added into the cache

--max-num-seqs doesn't help It tooks about 500ms Adding Requests..

[Performance]: Added request take too much time, and the model will not run untill all the request are added into the cache

@CuriousCat-7 Setting a higher OMP_NUM_THREADS value can indeed improve performance, Howerve, as observed in the vLLM project issue (https://github.com/vllm-project/vllm/issues/14538). The issue suggests that vLLM might be utilizing only one CPU...

support Qwen2-VL

How is the progress?

support Qwen2-VL

> > It's supported, pls see examples/multimodal for more info. > > Hi, Qwen2-VL can run successfully, but compared to directly import transformers, there is no significant improvement in time...

[Feature]: Application support for the InternVL2.5-78B series models.

hi did you solve it?

[Bug]: Vllm CPU mode only takes 1 single core for multi-core cpu

Is it because of the Python GIL that a single GPU can only use one CPU core, and are there any solutions to this problem now?

Quantized QQQ models encountered configuration field exceptions and inference garbled text issues when deployed in vLLM 0.9.1.

what about other version of vllm?

QQQ OOM

And Using CPU needs 400G Mem for a 20B model, also too slow..

[feature] sglang support VLM model spec

Really looking forward to the VLM's Eagle3 adaptation