Benjamin Bossan

Results 584 comments of Benjamin Bossan

Hey @fxmeng, after some internal discussion, we had some concerns about this line: https://github.com/huggingface/peft/pull/1626/files#diff-24a141c266b7b714ae8fcc470f31bc283f7b0f5a671bbf6d5f092741fc374104R194 The issue here is that the model base weights are modified when initializing with PiSSA. This...

> In fact, we can convert a trained PiSSA into LoRA without any loss in performance, allowing the sharing of the transformed LoRA to enjoy the training efficiency improvements brought...

@fxmeng Let me know once this is ready for another review.

> Regarding the conversion from PiSSA to LoRA, it might not be possible to compute (\Delta W) using only the residual model and PiSSA modules during the training process. Therefore,...

> Yes, the modified base weights + the trained PiSSA weights result in the fine-tuned model, but the difference from the pre-training model can only be calculated using the initial...

Sorry, I was on a conference these past few days. Will review soon.

Thanks for the updates @fxmeng. Could you please also fix the merge conflict and let me know once this is ready for review?

When I run your test above, the values I get the same or very similar values, except for T5 + 8bit: ``` (tensor(0.1253, device='cuda:0'), tensor(0.0223, device='cuda:0'), tensor(0.1440, device='cuda:0'), tensor(0.0288, device='cuda:0'))...

> We can see that `lora_dropout` in forward function is working the same way whether under train or inference mode. Did you try it out? The `nn.Dropout` layer is not...

We had a PR once in #980 but there were a few non-trivial decisions to be made. If you want to work on a PR, please check out the discussion...