Jee Jee Li comments

Results 209 comments of


                                            Jee Jee Li

[Bug]: Use v1 engine to load lora weights. If tp=1, the step of creating cudagraph will only use cpu. This causes this process to take a very long time. If tp>1, the gpu will be used normally for processing.

Could you describe it in more detail?

[Bug]: Use v1 engine to load lora weights. If tp=1, the step of creating cudagraph will only use cpu. This causes this process to take a very long time. If tp>1, the gpu will be used normally for processing.

@cjackal It might be an environment issue. Could you please provide your environment information? Thank you very much.

[LoRA] Adds support for bias in LoRA

Thanks for complete this feature. I have two question about this featue: - Is this feature compatible with PEFT? - Have you done any benchmarking? Adding ` --enable-lora-bias` seems to...

[Feature]: DeepSeek-R1-UD-IQ1_S(1.58bit-guff) support requirement

See: https://github.com/vllm-project/vllm/pull/13167

[Usage]: Execution speed of non-Lora requests

I haven't done precise testing, but I think your scenario is as expected.

[Usage]: Execution speed of non-Lora requests

> Oh, thank you. But do you have any hypotheses why this is so? What could be causing the slowdown? After all, these are requests that do not use adapters...

[Usage]: Execution speed of non-Lora requests

See: https://docs.vllm.ai/en/latest/dev/profiling/profiling_index.html

[Installation]: error: can't copy 'build/lib.linux-x86_64-3.10/vllm/_core_C.abi3.so': doesn't exist or not a regular file

>pip install -e . don't work even half a hour Use `pip install -vvv -e .` to display the build details, the likely cause could be related to compiling C...

[Bug]: Enable lora returns garbage output

Sorry for the delay feedback firstly. I can repro the garbage outputs. I guess the reason is due to `tie_word_embeddings`, so I deleted these [lines](https://github.com/vllm-project/vllm/blob/main/vllm/model_executor/models/llama.py#L443-L447), and the generated results now...

[Bugfix] Add custom Triton cache manager to resolve MoE MP issue

Thanks IMHO, this issue should be addressed by bundling the custom cache manager code inside the vllm. cc @simon-mo @youkaichao @Yard1