LLM-Shearing Default Initialization of Lambda Parameters to Zero

Default Initialization of Lambda Parameters to Zero

Open lpyhdzx opened this issue 8 months ago • 3 comments

Hi! Great work! I have a question about the default value of the lambda params. I've noticed that they are initialized to zero by default: lambda_1_layer = torch.nn.Parameter(torch.tensor(0.0, device=self.device)) Given that the Lagrangian loss is calculated using these parameters as follows: lagrangian_loss = lambda_1 * (expected_sparsity - target_sparsity) + lambda_2 * (expected_sparsity - target_sparsity) ** 2 Initializing lambda_1 and lambda_2 to zero seems to imply that the Lagrangian loss component will be zero, as there would be no penalty for deviating from the target sparsity.

So, is it intended for the lambda parameters to be initialized to zero? or is there another section of the code where these parameters are set or adjusted after initialization? I appreciate any clarifications or insights you can provide on this matter.

Jun 06 '24 03:06 lpyhdzx

LLM-Shearing LLM-Shearing copied to clipboard

Default Initialization of Lambda Parameters to Zero

LLM-Shearing
LLM-Shearing copied to clipboard