pykan icon indicating copy to clipboard operation
pykan copied to clipboard

Functional Approximation is poor

Open 1ssb opened this issue 9 months ago • 9 comments

Congratulations on this great piece of work. I have tried to do simple tests like functional approximations and it turns out for a variety of models KANs are poor performing wrt standard MLPs.

It could be that my model is much simpler and therefore not capable enough, but it would be nice to see these standard comparisons because that would inform how well they can be adapted broadly. Here is my implementation: https://github.com/1ssb/torchkan/blob/main/torchkan.py

1ssb avatar May 10 '24 14:05 1ssb

Here is a comparison between the models (Same layer configurations -- for both I have used the same positional encodings):

W B Chart 11_5_2024, 1_18_53 AM

W B Chart 11_5_2024, 1_18_33 AM

Both of them are trying to approximate the inverse of the Gaussian function.

1ssb avatar May 10 '24 15:05 1ssb

Have you tried lowering the grid number to 3 or less, I had a similar issue with another issue and it improved somewhat when I lowered the grid to 2.

pop756 avatar May 10 '24 16:05 pop756

Yes. I did, does not help.

Also since we are trying to move towards interpretability, let's take some additional onus on ourselves and explain why changes would make any difference anyway. Smaller grids would mean more control points right, but would that matter for a smooth function like an inverted gaussian? Please feel free to correct me.

1ssb avatar May 10 '24 16:05 1ssb

image image

If we infer a function such as e^(x^2+1), we divide the sequence into two parts and construct a KAN with two weight layers, x^2 + 1 and exp. If we construct a KAN with a layer such as [1,2,1], we can theoretically predict the above function correctly. However, if we use a different layer, for example, 1,3,1 or 1,3,2,1, the KAN will not make a prediction as shown below, but an additional function will be added when the input value is entered, causing an overfitting phenomenon, and I thought this phenomenon could occur.

When I configured the KAN layer as [1,5,2,1],grid=2, the result was lower than 0.00002 when training.

The MLP was configured as [1,64,64,1].

pop756 avatar May 10 '24 18:05 pop756

Sorry I should have been clearer. An inverted Gaussian as in f(x) = N(u, sigma^2) then y =f^-1 is what one has access to not an explicit parameterised model as such just samples from the inverse which depending on mapping is an R->R^n function.

1ssb avatar May 10 '24 18:05 1ssb

I'm sorry, I don't have the math knowledge to answer that question, but when I trained the function I presented above, I left the last 20% x (i.e. x corresponding to 1.8-3) as the validation set and trained the remaining 80% x data, the results were as follows when grid =7 and when grid = 3. I didn't apply L1 regulation here. I thought it might help you with the attached question.

Grid = 7 image Grid = 3 image

pop756 avatar May 10 '24 18:05 pop756

I am fundamentally unsure of what you mean by divide it into two parts or sequences, can you kindly explain?

1ssb avatar May 10 '24 19:05 1ssb

image This is the result of using this part as training data and image the rest of this photo as validation data. And when I trained like that, I changed only the Grid variable and plotted the learning process, I got the following result.

pop756 avatar May 10 '24 19:05 pop756

The errors I got were quite low but they got saturated as in 0.003 but it didnt go below that while MLPs went to the order of 1e-6.

1ssb avatar May 10 '24 20:05 1ssb