Jee Jee Li

Results 209 comments of Jee Jee Li

Thanks for complete this feature. I have two question about this featue: - Is this feature compatible with PEFT? - Have you done any benchmarking? Adding ` --enable-lora-bias` seems to...

See: https://github.com/vllm-project/vllm/pull/13167

I haven't done precise testing, but I think your scenario is as expected.

> Oh, thank you. But do you have any hypotheses why this is so? What could be causing the slowdown? After all, these are requests that do not use adapters...

See: https://docs.vllm.ai/en/latest/dev/profiling/profiling_index.html

>pip install -e . don't work even half a hour Use `pip install -vvv -e .` to display the build details, the likely cause could be related to compiling C...

Sorry for the delay feedback firstly. I can repro the garbage outputs. I guess the reason is due to `tie_word_embeddings`, so I deleted these [lines](https://github.com/vllm-project/vllm/blob/main/vllm/model_executor/models/llama.py#L443-L447), and the generated results now...

Thanks IMHO, this issue should be addressed by bundling the custom cache manager code inside the vllm. cc @simon-mo @youkaichao @Yard1