Jee Jee Li
Jee Jee Li
> Thank you. I am new to vllm, Is this online inference? Yes
> So, how about offline inference? can async engine be used in offline inference? I think it cannot be
Thank you very much for bringing this feature. We will consider supporting it.
> [@jeejeelee](https://github.com/jeejeelee) https://x.com/winglian/status/1888951180606202028 GRPO + DoRA converges faster than GRPO+ FFT or GRPO + LoRA (thanks [@winglian](https://github.com/winglian) for the great finding!) Thanks, will start trying to support DoRA soon
I have considered this issue before. A rather tricky problem is that the TP>1 case is not easy to handle
Let's keep it open, thank you
It looks like there's a problem with your LoRA weights or you've loaded incorrect weights ```python 2024-10-16T03:44:47.190186093Z ERROR 10-15 20:44:47 engine.py:160] raise ValueError(f"{name} is unsupported LoRA weight") 2024-10-16T03:44:47.190202703Z ERROR 10-15...
> Does vLLM not support QLORA for GPTQ models yet? vLLM does support QLORA for GPTQ models.
You mean your lora config includes `modules_to_save`, right? If that's the case, the current vllm doesn't support this
> If the loading of adapter works with Transformers, then it should also work with vLLM, right? That's not it. For example, vllm currently doesn't support things like `dora`.