lorax About the DoRA weights inference

About the DoRA weights inference

Open thincal opened this issue 1 year ago • 1 comments

trafficstars

Feature request

DoRA introduces a bigger overhead than pure LoRA, so it is recommended to merge weights for inference, see https://github.com/huggingface/peft/blob/main/docs/source/developer_guides/lora.md#weight-decomposed-low-rank-adaptation-dora, it seems that this method will break current dynamic inference feature, is there any consideration ?

Motivation

support new lora method

Your contribution

depends

Mar 05 '24 03:03 thincal

Hey @thincal, definitely we plan on supporting DoRA. I think the main thing that needs to be figured out is how to efficiently serve DoRA without merging back the weights into the base model, while still achieving good throughput and latency. This will require some experimentation on our side.

Mar 05 '24 20:03 tgaddair

lorax lorax copied to clipboard

About the DoRA weights inference

Feature request

Motivation

Your contribution

lorax
lorax copied to clipboard