LoRA After joining Lora, the first few layers show a gradient of 0

After joining Lora, the first few layers show a gradient of 0

Open hluckye opened this issue 3 months ago • 0 comments

I am a beginner in deep learning and I would like to know if the reason for the gradient to be 0 is due to the vanishing gradient or if my data is too small (batch_size=32)。

I tried to add Lora to a three-layer neural network, but the result was that only the gradients of the Lora_a and Lora_b matrices in the last layer were below 1e-2, while the gradients of the other layers were all 0. My definition of lora. linear is as follows:

self.prednet_full1_lora = lora.Linear(self.prednet_input_len,self.prednet_len1,r=4) self.prednet_full2_lora = lora.Linear(self.prednet_len1, self.prednet_len2, r=4) self.prednet_full3_lora = lora.Linear(self.prednet_len2, 1,r=4)

The forward part of the model is shown below (assuming input_x is the input):

input_x = torch.sigmoid(self.prednet_full1_lora.forward(input_x)) input_x = torch.sigmoid(self.prednet_full2_lora.forward(input_x)) output = torch.sigmoid(self.prednet_full3_lora.forward(input_x))

and I don't forget to write ：

loss.backward() optimizer.step() net.apply_clipper()

I would greatly appreciate it if you could provide some ideas or solutions

Mar 23 '24 16:03 hluckye

LoRA LoRA copied to clipboard

After joining Lora, the first few layers show a gradient of 0

LoRA
LoRA copied to clipboard