pykan Clarification on L1 Norm Regularization in Paper vs. Code Implementation

Clarification on L1 Norm Regularization in Paper vs. Code Implementation

Open MasayukiNagai opened this issue 4 months ago • 0 comments

Hi, I'm a bit confused about the L1 norm as defined in the paper vs. how it's implemented in the code. From the paper, L1 norm seems to be defined based on the magnitudes of activations, but in the code, the regularization focuses on the the input-output scaling by computing variance between inputs and outputs. Could someone help clarify this? Am I missing something, or is this a deliberate change in the implementation?

Here's the code snippet where the L1 norm seems to be computed:

# MultKAN.py: forward, line 785~
x_numerical, preacts, postacts_numerical, postspline = self.act_fun[l](x)

if self.save_act:
    input_range = torch.std(preacts, dim=0) + 0.1
    output_range_spline = torch.std(postacts_numerical, dim=0) # for training, only penalize the spline part
    
    self.acts_scale_spline.append(output_range_spline / input_range)

# MultKAN.py: reg, line 1294~
if reg_metric == 'edge_forward_spline_n':
    acts_scale = self.acts_scale_spline
            
vec = acts_scale[i]
l1 = torch.sum(vec)

Oct 04 '24 17:10 MasayukiNagai

pykan pykan copied to clipboard

Clarification on L1 Norm Regularization in Paper vs. Code Implementation

pykan
pykan copied to clipboard