LoRA
LoRA copied to clipboard
How to optimize B, which is the all zero matrix in the Lora method?
Thanks for the question! You can optimize B as usual, e.g., with Adam, since it will get a non-zero gradient in general.
Isn't it the case that all rows of the B matrix are linearly dependent, making it effectively a rank 1 matrix. Could it be simply reduced to a product of two vectors?
Edit: Nevermind. I went through the gradient update equations again and it seems like this should not happen as long as other weights of the network are initialized randomly.
I have the same problem. When I perform gradient backpropagation, the weight of A can be updated, but the weight of B is always 0. Please tell me how should I solve this problem? Thank you!
I have the same problem. When I perform gradient backpropagation, the weight of A can be updated, but the weight of B is always 0. Please tell me how should I solve this problem? Thank you!
I met the same problem, have u solved it?
This shouldn't happen. Can you elaborate on your setup?