pykan
pykan copied to clipboard
FYI: convolutional layer with KAN
https://github.com/StarostinV/convkan
A basic yet efficient (thanks to efficient-kan) implementation of the convolutional operation with KAN implemented using F.unfold.
Are you really sure you coded all that on your own my friend ? 😉
I am pretty sure. Please share the link so that we can compare the implementation. Everybody would benefit from that!
EDIT: I found it, looks good! My implementation supports grouped convolutions and is tested, but otherwise it is very similar.
Be my guest mate https://github.com/paulestano/LeKan
Be my guest mate https://github.com/paulestano/LeKan
I was not aware that one could use unfold as a module. However, your implementation lacks support for padding_mode and groups, and it has not been thoroughly tested. In contrast, my implementation serves as a direct replacement for Conv2d. Sharing the code for the benefit of everyone is more productive than making accusations of theft. Frankly, it's an obvious idea to implement convolution with KAN. The main reason I did it was my surprise that nobody else had done so, especially given that others have argued it's impossibe in another open issue on the package. Cheers!
The main reason I did it was my surprise that nobody else had done so, especially given that others have argued it's impossibe in another open issue on the package
except I had done it 4 days ago and shared it on the said issue (https://github.com/KindXiaoming/pykan/issues/9#issuecomment-2097866072 ) 2 days ago… I even described it in very similar words than in your issue... How surprising is that ?
On a more scientific note I can’t wait for you to share convincing results on cifar. Unless that thing is obvious as well 😉
The main reason I did it was my surprise that nobody else had done so, especially given that others have argued it's impossibe in another open issue on the package
except I had done it 4 days ago and shared it on the said issue (#9 (comment) ) 2 days ago… I even described it in very similar words than in your issue... How surprising is that ?
I meant this comment. As you can see, it was made two days ago and they also didn't know about your code. Seriously, if you think that your github account and your comment are so visible, I don't know what to say. For instance, there are dozens independent implementations of efficient kan - are you gonna accuse them of stealing ideas, too? I am trying to be polite, but it is just nonsense.
Concurrent work happens but the phrasing as well as the timeline are unfortunate here. Everyone will make their own mind…
Could you guys please explain what do you mean by implementing "conv layer in KAN"? KAN is the equivalent of dense, conv layer is an operation defined in mathematics. How can you implement an operation in KAN and why would you do it?
it seems more plausible to replace the classification dense layers with KAN, but the feature extraction?
@hesamsheikh Basically, you can rephrase a convolution operator as a simple matrix multiplication as in Linear layer, that's why if we could replace the normal Linear layer with KAN, so does the convolution layer.
@hesamsheikh Basically, you can rephrase a convolution operator as a simple matrix multiplication as in Linear layer, that's why if we could replace the normal Linear layer with KAN, so does the convolution layer.
But there's a reason we're not substituting Conv layers with Linear completely. KAN isn't local or spatial in nature, what makes them suitable to be placed instead of Conv?
@hesamsheikh Basically, you can rephrase a convolution operator as a simple matrix multiplication as in Linear layer, that's why if we could replace the normal Linear layer with KAN, so does the convolution layer.
But there's a reason we're not substituting Conv layers with Linear completely. KAN isn't local or spatial in nature, what makes them suitable to be placed instead of Conv?
@hesamsheikh I don't say they are the same, as Linear isn't local or spatial too. But I just say that we can implement the convolution operation using a matrix multiplication operation (with some modifications on the input of course). That's what encourage us to think a way to incorporate KAN to convolution.
KAN is just rephrase the way to compute the next layer feature, instead of taking the weighted sum of the input features and then do some activation, we would take the weighted sum of the b-spline functions, which is a better way to interpret how NN works. Then instead of taking a whole input space as the potential contributors (just like a Linear layer), we instead just look at a neighborhood of features, and the way we "judge" all neighborhood is the same, then what we get should be similar to a convolution layer.
@hesamsheikh Basically, you can rephrase a convolution operator as a simple matrix multiplication as in Linear layer, that's why if we could replace the normal Linear layer with KAN, so does the convolution layer.
But there's a reason we're not substituting Conv layers with Linear completely. KAN isn't local or spatial in nature, what makes them suitable to be placed instead of Conv?
There might be some confusion here: convolutional operations are of course not limited to the typical implementation involving an affine transformation followed by an activation function. One can choose any trainable function as a kernel to perform convolution, and KANs are no exception. When using KAN as a convolutional kernel, the convolutional operation is of course still "local". On the other hand, there is nothing inherently local about the standard kernels used in Conv layers. So it is not like we do something strange just because we can perform convolution as matrix multiplication, but we simply use KAN as a convolutional kernel.
@hesamsheikh Basically, you can rephrase a convolution operator as a simple matrix multiplication as in Linear layer, that's why if we could replace the normal Linear layer with KAN, so does the convolution layer.
But there's a reason we're not substituting Conv layers with Linear completely. KAN isn't local or spatial in nature, what makes them suitable to be placed instead of Conv?
There might be some confusion here: convolutional operations are of course not limited to the typical implementation involving an affine transformation followed by an activation function. One can choose any trainable function as a kernel to perform convolution, and KANs are no exception. When using KAN as a convolutional kernel, the convolutional operation is of course still "local". On the other hand, there is nothing inherently local about the standard kernels used in Conv layers. So it is not like we do something strange just because we can perform convolution as matrix multiplication, but we simply use KAN as a convolutional kernel.
so to make sure I'm getting this right, you're using KAN as the kernel function of the convolution?
@hesamsheikh Basically, you can rephrase a convolution operator as a simple matrix multiplication as in Linear layer, that's why if we could replace the normal Linear layer with KAN, so does the convolution layer.
But there's a reason we're not substituting Conv layers with Linear completely. KAN isn't local or spatial in nature, what makes them suitable to be placed instead of Conv?
There might be some confusion here: convolutional operations are of course not limited to the typical implementation involving an affine transformation followed by an activation function. One can choose any trainable function as a kernel to perform convolution, and KANs are no exception. When using KAN as a convolutional kernel, the convolutional operation is of course still "local". On the other hand, there is nothing inherently local about the standard kernels used in Conv layers. So it is not like we do something strange just because we can perform convolution as matrix multiplication, but we simply use KAN as a convolutional kernel.
so to make sure I'm getting this right, you're using KAN as the kernel function of the convolution?
Yep that's the whole idea.
Hi, here I implement ConvKAN with different activation formulations with their corresponding inference time. https://github.com/XiangboGaoBarry/ConvKAN-Zoo We evaluate the result on CIFAR10 dataset.
Hi, here I implement ConvKAN with different activation formulations with their corresponding inference time. https://github.com/XiangboGaoBarry/ConvKAN-Zoo We evaluate the result on CIFAR10 dataset.
You can also make a pull request to add your repo to this collection of KANs https://github.com/mintisan/awesome-kan