Jee Jee Li
Jee Jee Li
@chenqianfzh Can we add more quantization type examples in qlora_example.py, such as GPT+LoRA, so that users can refer to this script to learn how to utilize LoRA on quantized model,...
> @jeejeelee @Yard1 @mgoin > > I have updated the PR, addressing and resolving all the comments. Additionally, I have added the necessary unit tests. Could u please review it...
> > > @jeejeelee @Yard1 @mgoin > > > I have updated the PR, addressing and resolving all the comments. Additionally, I have added the necessary unit tests. Could u...
@ywang96 Thanks for driving the integration of more MM models into VLLM. :heart_eyes: It seems that there is no plan to refactor `vision encoder` (todo in [llava](https://github.com/vllm-project/vllm/blob/main/vllm/model_executor/models/llava.py#L5)). In my view,...
It might be due to bf16. SM75 doesn't support bf16.
GPTQ is not yet supported for this MOE Model. There is a PR in vLLM attempting to address this issue, see: https://github.com/vllm-project/vllm/pull/6502
The main issue is that `FusedMoE` doesn't support LoRA, which is blocking this feature.
Yes, it's a bug, I am working on fixing it
@bi1101 can you share your lora config?
It seems that the expert layers have been fine-tuned, which indeed makes it difficult to support LoRA in the short term.