pykan
pykan copied to clipboard
Clarification on L1 Norm Regularization in Paper vs. Code Implementation
Hi, I'm a bit confused about the L1 norm as defined in the paper vs. how it's implemented in the code. From the paper, L1 norm seems to be defined based on the magnitudes of activations, but in the code, the regularization focuses on the the input-output scaling by computing variance between inputs and outputs. Could someone help clarify this? Am I missing something, or is this a deliberate change in the implementation?
Here's the code snippet where the L1 norm seems to be computed:
# MultKAN.py: forward, line 785~
x_numerical, preacts, postacts_numerical, postspline = self.act_fun[l](x)
if self.save_act:
input_range = torch.std(preacts, dim=0) + 0.1
output_range_spline = torch.std(postacts_numerical, dim=0) # for training, only penalize the spline part
self.acts_scale_spline.append(output_range_spline / input_range)
# MultKAN.py: reg, line 1294~
if reg_metric == 'edge_forward_spline_n':
acts_scale = self.acts_scale_spline
vec = acts_scale[i]
l1 = torch.sum(vec)