Ziming Liu
Ziming Liu
Hi, sorry I don't think I quite get your question. My understanding is that one can place MLPs with KANs in a transformer (it should be as simple as that),...
Hi, it is still Gaussian but just a part of it, so locally it looks like a quadratic function. The real problem with seed=55 looks like the regularization `lamb` is...
Also, although the Gaussian function is fixed, the input and output affine transforations are learnable. By keep_fit, you make sure the affine transformations are always identity. Glad that you solved...
I see that you have a bool `keep_fit`, that sounds reasonable. If you want, you can do a PR (please set `keep_fit=False` by default since it will slow down things...
非常好的建议,感谢!
Hi, I mean the latter "KAN takes 10X as much wall-clock time to run a single step of training (forward, backward, and gradient update) in comparison with a same-parameter MLP?"...
Hi, might be good to run a baseline model (say linear regression, MLP) to see how that performs. And then we can have a sense of how hard this problem...
could it be the problem of torch version, mine is `torch==2.2.2`.
Hi, that's a bit werid. I ran into a similar problem before, but that was in the very early stage of code development. I did find that that trying driver...
That's correct, my implementation follows that wiki page. The indices in the code might be a bit different from the wiki page.