Jee Jee Li

Results 209 comments of Jee Jee Li

> Thank you. I am new to vllm, Is this online inference? Yes

> So, how about offline inference? can async engine be used in offline inference? I think it cannot be

Thank you very much for bringing this feature. We will consider supporting it.

> [@jeejeelee](https://github.com/jeejeelee) https://x.com/winglian/status/1888951180606202028 GRPO + DoRA converges faster than GRPO+ FFT or GRPO + LoRA (thanks [@winglian](https://github.com/winglian) for the great finding!) Thanks, will start trying to support DoRA soon

I have considered this issue before. A rather tricky problem is that the TP>1 case is not easy to handle

Let's keep it open, thank you

It looks like there's a problem with your LoRA weights or you've loaded incorrect weights ```python 2024-10-16T03:44:47.190186093Z ERROR 10-15 20:44:47 engine.py:160] raise ValueError(f"{name} is unsupported LoRA weight") 2024-10-16T03:44:47.190202703Z ERROR 10-15...

> Does vLLM not support QLORA for GPTQ models yet? vLLM does support QLORA for GPTQ models.

You mean your lora config includes `modules_to_save`, right? If that's the case, the current vllm doesn't support this

> If the loading of adapter works with Transformers, then it should also work with vLLM, right? That's not it. For example, vllm currently doesn't support things like `dora`.