lorax
lorax copied to clipboard
About the DoRA weights inference
trafficstars
Feature request
DoRA introduces a bigger overhead than pure LoRA, so it is recommended to merge weights for inference, see https://github.com/huggingface/peft/blob/main/docs/source/developer_guides/lora.md#weight-decomposed-low-rank-adaptation-dora, it seems that this method will break current dynamic inference feature, is there any consideration ?
Motivation
support new lora method
Your contribution
depends
Hey @thincal, definitely we plan on supporting DoRA. I think the main thing that needs to be figured out is how to efficiently serve DoRA without merging back the weights into the base model, while still achieving good throughput and latency. This will require some experimentation on our side.