LLM-Shearing
LLM-Shearing copied to clipboard
Default Initialization of Lambda Parameters to Zero
Hi! Great work!
I have a question about the default value of the lambda params. I've noticed that they are initialized to zero by default:
lambda_1_layer = torch.nn.Parameter(torch.tensor(0.0, device=self.device))
Given that the Lagrangian loss is calculated using these parameters as follows:
lagrangian_loss = lambda_1 * (expected_sparsity - target_sparsity) + lambda_2 * (expected_sparsity - target_sparsity) ** 2
Initializing lambda_1 and lambda_2 to zero seems to imply that the Lagrangian loss component will be zero, as there would be no penalty for deviating from the target sparsity.
So, is it intended for the lambda parameters to be initialized to zero? or is there another section of the code where these parameters are set or adjusted after initialization? I appreciate any clarifications or insights you can provide on this matter.