pykan icon indicating copy to clipboard operation
pykan copied to clipboard

FYI: convolutional layer with KAN

Open StarostinV opened this issue 9 months ago • 17 comments

https://github.com/StarostinV/convkan

A basic yet efficient (thanks to efficient-kan) implementation of the convolutional operation with KAN implemented using F.unfold.

StarostinV avatar May 09 '24 19:05 StarostinV

Are you really sure you coded all that on your own my friend ? 😉 IMG_3922

paulestano avatar May 10 '24 10:05 paulestano

I am pretty sure. Please share the link so that we can compare the implementation. Everybody would benefit from that!

EDIT: I found it, looks good! My implementation supports grouped convolutions and is tested, but otherwise it is very similar.

StarostinV avatar May 10 '24 12:05 StarostinV

Be my guest mate https://github.com/paulestano/LeKan

paulestano avatar May 10 '24 12:05 paulestano

Be my guest mate https://github.com/paulestano/LeKan

I was not aware that one could use unfold as a module. However, your implementation lacks support for padding_mode and groups, and it has not been thoroughly tested. In contrast, my implementation serves as a direct replacement for Conv2d. Sharing the code for the benefit of everyone is more productive than making accusations of theft. Frankly, it's an obvious idea to implement convolution with KAN. The main reason I did it was my surprise that nobody else had done so, especially given that others have argued it's impossibe in another open issue on the package. Cheers!

StarostinV avatar May 10 '24 12:05 StarostinV

The main reason I did it was my surprise that nobody else had done so, especially given that others have argued it's impossibe in another open issue on the package

except I had done it 4 days ago and shared it on the said issue (https://github.com/KindXiaoming/pykan/issues/9#issuecomment-2097866072 ) 2 days ago… I even described it in very similar words than in your issue... How surprising is that ?

paulestano avatar May 10 '24 12:05 paulestano

On a more scientific note I can’t wait for you to share convincing results on cifar. Unless that thing is obvious as well 😉

paulestano avatar May 10 '24 12:05 paulestano

The main reason I did it was my surprise that nobody else had done so, especially given that others have argued it's impossibe in another open issue on the package

except I had done it 4 days ago and shared it on the said issue (#9 (comment) ) 2 days ago… I even described it in very similar words than in your issue... How surprising is that ?

I meant this comment. As you can see, it was made two days ago and they also didn't know about your code. Seriously, if you think that your github account and your comment are so visible, I don't know what to say. For instance, there are dozens independent implementations of efficient kan - are you gonna accuse them of stealing ideas, too? I am trying to be polite, but it is just nonsense.

StarostinV avatar May 10 '24 12:05 StarostinV

Concurrent work happens but the phrasing as well as the timeline are unfortunate here. Everyone will make their own mind…

paulestano avatar May 10 '24 13:05 paulestano

Could you guys please explain what do you mean by implementing "conv layer in KAN"? KAN is the equivalent of dense, conv layer is an operation defined in mathematics. How can you implement an operation in KAN and why would you do it?

it seems more plausible to replace the classification dense layers with KAN, but the feature extraction?

hesamsheikh avatar May 10 '24 14:05 hesamsheikh

@hesamsheikh Basically, you can rephrase a convolution operator as a simple matrix multiplication as in Linear layer, that's why if we could replace the normal Linear layer with KAN, so does the convolution layer.

minh-nguyenhoang avatar May 10 '24 14:05 minh-nguyenhoang

@hesamsheikh Basically, you can rephrase a convolution operator as a simple matrix multiplication as in Linear layer, that's why if we could replace the normal Linear layer with KAN, so does the convolution layer.

But there's a reason we're not substituting Conv layers with Linear completely. KAN isn't local or spatial in nature, what makes them suitable to be placed instead of Conv?

hesamsheikh avatar May 10 '24 15:05 hesamsheikh

@hesamsheikh Basically, you can rephrase a convolution operator as a simple matrix multiplication as in Linear layer, that's why if we could replace the normal Linear layer with KAN, so does the convolution layer.

But there's a reason we're not substituting Conv layers with Linear completely. KAN isn't local or spatial in nature, what makes them suitable to be placed instead of Conv?

@hesamsheikh I don't say they are the same, as Linear isn't local or spatial too. But I just say that we can implement the convolution operation using a matrix multiplication operation (with some modifications on the input of course). That's what encourage us to think a way to incorporate KAN to convolution.

KAN is just rephrase the way to compute the next layer feature, instead of taking the weighted sum of the input features and then do some activation, we would take the weighted sum of the b-spline functions, which is a better way to interpret how NN works. Then instead of taking a whole input space as the potential contributors (just like a Linear layer), we instead just look at a neighborhood of features, and the way we "judge" all neighborhood is the same, then what we get should be similar to a convolution layer.

minh-nguyenhoang avatar May 10 '24 15:05 minh-nguyenhoang

@hesamsheikh Basically, you can rephrase a convolution operator as a simple matrix multiplication as in Linear layer, that's why if we could replace the normal Linear layer with KAN, so does the convolution layer.

But there's a reason we're not substituting Conv layers with Linear completely. KAN isn't local or spatial in nature, what makes them suitable to be placed instead of Conv?

There might be some confusion here: convolutional operations are of course not limited to the typical implementation involving an affine transformation followed by an activation function. One can choose any trainable function as a kernel to perform convolution, and KANs are no exception. When using KAN as a convolutional kernel, the convolutional operation is of course still "local". On the other hand, there is nothing inherently local about the standard kernels used in Conv layers. So it is not like we do something strange just because we can perform convolution as matrix multiplication, but we simply use KAN as a convolutional kernel.

StarostinV avatar May 10 '24 21:05 StarostinV

@hesamsheikh Basically, you can rephrase a convolution operator as a simple matrix multiplication as in Linear layer, that's why if we could replace the normal Linear layer with KAN, so does the convolution layer.

But there's a reason we're not substituting Conv layers with Linear completely. KAN isn't local or spatial in nature, what makes them suitable to be placed instead of Conv?

There might be some confusion here: convolutional operations are of course not limited to the typical implementation involving an affine transformation followed by an activation function. One can choose any trainable function as a kernel to perform convolution, and KANs are no exception. When using KAN as a convolutional kernel, the convolutional operation is of course still "local". On the other hand, there is nothing inherently local about the standard kernels used in Conv layers. So it is not like we do something strange just because we can perform convolution as matrix multiplication, but we simply use KAN as a convolutional kernel.

so to make sure I'm getting this right, you're using KAN as the kernel function of the convolution?

hesamsheikh avatar May 10 '24 21:05 hesamsheikh

@hesamsheikh Basically, you can rephrase a convolution operator as a simple matrix multiplication as in Linear layer, that's why if we could replace the normal Linear layer with KAN, so does the convolution layer.

But there's a reason we're not substituting Conv layers with Linear completely. KAN isn't local or spatial in nature, what makes them suitable to be placed instead of Conv?

There might be some confusion here: convolutional operations are of course not limited to the typical implementation involving an affine transformation followed by an activation function. One can choose any trainable function as a kernel to perform convolution, and KANs are no exception. When using KAN as a convolutional kernel, the convolutional operation is of course still "local". On the other hand, there is nothing inherently local about the standard kernels used in Conv layers. So it is not like we do something strange just because we can perform convolution as matrix multiplication, but we simply use KAN as a convolutional kernel.

so to make sure I'm getting this right, you're using KAN as the kernel function of the convolution?

Yep that's the whole idea.

minh-nguyenhoang avatar May 11 '24 01:05 minh-nguyenhoang

Hi, here I implement ConvKAN with different activation formulations with their corresponding inference time. https://github.com/XiangboGaoBarry/ConvKAN-Zoo We evaluate the result on CIFAR10 dataset.

XiangboGaoBarry avatar May 15 '24 16:05 XiangboGaoBarry

Hi, here I implement ConvKAN with different activation formulations with their corresponding inference time. https://github.com/XiangboGaoBarry/ConvKAN-Zoo We evaluate the result on CIFAR10 dataset.

You can also make a pull request to add your repo to this collection of KANs https://github.com/mintisan/awesome-kan

StarostinV avatar May 15 '24 18:05 StarostinV