peft orthogonal lora layer init

see: https://datta0.github.io/posts/rethink-lora-init/

Feb 20 '25 02:02 winglian

This is just OLoRA but starting from random weights. How can starting from random weights, rather than getting that information from pretrained weights, converge faster? Did you actually run tests? Because in our research, and every other subsequent research showed that OLoRA and other derivatives like PISSA etc. perform better than any random initialization. For a list of studies see.

Feb 20 '25 11:02 buyukakyuz

@tokenizer-decode Thanks for commenting. It would indeed by nice to see a comparison with OLoRA or PiSSA, which the linked blog post didn't test. I could see an argument for the proposed initialization method being easier to use, as the base weights are unchanged, so even if it's not as good, there could be some value. WDYT?

Feb 20 '25 11:02 BenjaminBossan

I honestly don't see the performance benefit. But if you think there is an ease of use benefit, there could be some value.

This goes for every other decomposition method, SVD e.g.. If the value is not updating the base weights, we can always let the user use the method with a parameter like no_update and we would turn off the part where we update the base weights.

But I might add, for future readers who are confused, updating base weights is generally where you get the performance.

Feb 20 '25 12:02 buyukakyuz

Screenshot 2025-02-20 at 10 36 09 PM here's GRPO + PEFT. olora initialization goes straight to 0.0 rewards after the first step. orthogonal outperforms dora too.

If it's easier, I can convert this so that the init_lora accepts a callable and users can provide their own initialization function

EDIT: something like

class InitLoraWeights(Protocol):
    def __call__(self, layer, adapter_name) -> None:
        pass

and the Config typing would look something like:

bool | Literal[...] | InitLoraWeights

Feb 21 '25 03:02 winglian

here's GRPO + PEFT. olora initialization goes straight to 0.0 rewards after the first step.

Thanks for running the tests :tada: Is the script open so that we can check what might be going on with OLoRA?

If it's easier, I can convert this so that the init_lora accepts a callable and users can provide their own initialization function

In general, we would like to avoid this, even though it could be practical. The reason is that we wouldn't be able to serialize the LoraConfig into JSON with values that are Python code.

In sum, I think we can still proceed with the orthogonal weight initialization method. As I mentioned, even if it did not outperform OLoRA or similar methods, it could still be valuable as a more user friendly option.

Feb 21 '25 10:02 BenjaminBossan

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Mar 22 '25 15:03 github-actions[bot]

@winglian Do you have time to finish the PR? If not, let us know so that one of us can take over.

Mar 24 '25 10:03 BenjaminBossan

@winglian I finished up the PR in #2498, would be grateful if you could take a look. Of course, I would add you as a co-author (we could add @datta0 as well).

Apr 15 '25 14:04 BenjaminBossan

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

May 09 '25 15:05 github-actions[bot]

@winglian I merged #2498, which supersedes this PR, so I'm closing it now. I added you as co-author.

Jun 16 '25 16:06 BenjaminBossan