lorax icon indicating copy to clipboard operation
lorax copied to clipboard

About the DoRA weights inference

Open thincal opened this issue 1 year ago • 1 comments
trafficstars

Feature request

DoRA introduces a bigger overhead than pure LoRA, so it is recommended to merge weights for inference, see https://github.com/huggingface/peft/blob/main/docs/source/developer_guides/lora.md#weight-decomposed-low-rank-adaptation-dora, it seems that this method will break current dynamic inference feature, is there any consideration ?

Motivation

support new lora method

Your contribution

depends

thincal avatar Mar 05 '24 03:03 thincal

Hey @thincal, definitely we plan on supporting DoRA. I think the main thing that needs to be figured out is how to efficiently serve DoRA without merging back the weights into the base model, while still achieving good throughput and latency. This will require some experimentation on our side.

tgaddair avatar Mar 05 '24 20:03 tgaddair