LoRA
LoRA copied to clipboard
AB matrix initialization in layers.py does not conform to the description of the paper
"We use a random Gaussian initialization for A and zero for B,” in paper but: ` def reset_parameters(self):
nn.Embedding.reset_parameters(self)
if hasattr(self, 'lora_A'):
# initialize A the same way as the default for nn.Linear and B to zero
nn.init.zeros_(self.lora_A)
nn.init.normal_(self.lora_B)
` in layers.py
Hi Jinxin,
We didn’t apply LoRA to embedding layers in the paper. In any case, this shouldn’t make a meaningful difference whether A or B is initialized to zero as long as the other one is not zero. Let me know if you see a substantial difference tho!
On Jul 9, 2023, at 11:23 PM, jinxin-zhu @.***> wrote:
"We use a random Gaussian initialization for A and zero for B,” in paper but: def reset_parameters(self): nn.Embedding.reset_parameters(self) if hasattr(self, 'lora_A'): # initialize A the same way as the default for nn.Linear and B to zero nn.init.zeros_(self.lora_A) nn.init.normal_(self.lora_B) in layers.py
— Reply to this email directly, view it on GitHub https://github.com/microsoft/LoRA/issues/98, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJ5U6MHWXABDN3DVOIRDEVDXPNYRRANCNFSM6AAAAAA2D5VCCU. You are receiving this because you are subscribed to this thread.
@edwardjhu can you please tell us why at least one of A or B has to be non-zero?
@edwardjhu can you please tell us why at least one of A or B has to be non-zero?
May be the paper say that? Ensure that at the beginning of the training phase, the matrix product of LoRA's A and B is 0. Maybe it’s to start stable training.
We want at least one of the matrix to be zero so LoRA in the first forward pass is a no-op, which indeed stabilizes training. Say that we are generating some content with a LM. If both matrices are non-zero, the random LoRA init, if large enough, might move the entire model so far from the original that we start generating garbage, which is bad for training.