LoRA icon indicating copy to clipboard operation
LoRA copied to clipboard

why use alpha/r in stead of alpha?

Open dingguo1996 opened this issue 1 year ago • 2 comments

Paper said we need scale the lora learning rate with alpha/r. But why use alpha/r in stead of alpha? image

dingguo1996 avatar Jul 20 '23 08:07 dingguo1996

The magnitude of the preactivation after B is \Theta(r) after training with adaptive optimizers. Dividing by r stabilizes it and makes HP tuning easier as mentioned at the end of the paragraph.

edwardjhu avatar Aug 05 '23 17:08 edwardjhu

The magnitude of the preactivation after B is \Theta(r) after training with adaptive optimizers. Dividing by r stabilizes it and makes HP tuning easier as mentioned at the end of the paragraph.

Thx! But I want to further know about "The magnitude of the preactivation after B is \Theta(r)", would you like to show us an explanation? @edwardjhu

chrisway613 avatar Aug 18 '23 10:08 chrisway613