Jee Jee Li comments

Results 209 comments of


                                            Jee Jee Li

[Usage]: How to use pipeline parallelism in offline inference?

> Thank you. I am new to vllm, Is this online inference? Yes

[Usage]: How to use pipeline parallelism in offline inference?

> So, how about offline inference? can async engine be used in offline inference? I think it cannot be

[Feature]: add DoRA support

Thank you very much for bringing this feature. We will consider supporting it.

> [@jeejeelee](https://github.com/jeejeelee) https://x.com/winglian/status/1888951180606202028 GRPO + DoRA converges faster than GRPO+ FFT or GRPO + LoRA (thanks [@winglian](https://github.com/winglian) for the great finding!) Thanks, will start trying to support DoRA soon

[Feature]: add DoRA support

I have considered this issue before. A rather tricky problem is that the TP>1 case is not easy to handle

[Feature]: add DoRA support

Let's keep it open, thank you

[Bug]: Unable to infer QLoRA adapter using vLLM Docker

It looks like there's a problem with your LoRA weights or you've loaded incorrect weights ```python 2024-10-16T03:44:47.190186093Z ERROR 10-15 20:44:47 engine.py:160] raise ValueError(f"{name} is unsupported LoRA weight") 2024-10-16T03:44:47.190202703Z ERROR 10-15...