MNIST Example
Thank you for the excellent work! I wanted to ask if it's possible to conduct toy experiments using the MNIST dataset. As you know, MNIST is a standard dataset used in image classification tasks.
Thanks!
what is the number of variables? pixel level? 224224? 10241024 for HR img task? It seems like not a good idea, not the proper scenario, e2e using it directly, is the dimension too high?
if me: conv/CNN or patch/ViT to small number, eg <100, then KAN.
another related issue: https://github.com/KindXiaoming/pykan/issues/9
what is the number of variables? pixel level? 224_224? 1024_1024 for HR img task? It seems like not a good idea, not the proper scenario, e2e using it directly, is the dimension too high?
if me: conv/CNN or patch/ViT to small number, eg <100, then KAN.
another related issue: #9
MNIST images are of the size 28x28, relatively small. I haven't read the paper carefully yet, but I think we need more research to generalize KAN to vision
Has anyone tried this dataset before? I set model = KAN(width=[2828,10],grid = 3, k=3) and results = model.train(dataset, opt="LBFGS", steps=20, metrics=(train_acc, test_acc), loss_fn=torch.nn.CrossEntropyLoss()); There comes an error : RuntimeError: [enforce fail at alloc_cpu.cpp:114] data. DefaultCPUAllocator: not enough memory: you tried to allocate 30105600000 bytes. And i have run it for 0.5h on my poor i5-9300h 😂 But it seems a simple MLP with 2828-> 1000 -> 500 ->10 can be trained within 1 min.
I made basic test on MNIST https://github.com/KindXiaoming/pykan/pull/57 though it gives me segmentation fault on cuda or with larger than 28*28, 25,10 networks
It should be possible to apply convolution with https://pytorch.org/docs/stable/generated/torch.nn.Unfold.html and use relatively small KAN
I trained a ClassicMLP model on MNIST with an input size of 784, hidden layer sizes of [128, 64], and an output size of 10. This model was trained using the Adam optimizer with a learning rate of 0.001. It achieved an accuracy of approximately 93.93% on the test set.
Regarding the KAN model, I used a basic architecture with a single hidden layer of size 128 and an output layer of size 10, similar to the ClassicMLP output layer. I trained the KAN network using the same Adam optimizer and learning rate. It achieved a higher accuracy of around 97.03% on the test set.
Have you tried any optimizations or experimented with different architectures to address the memory issue with your KAN model?
I trained a ClassicMLP model on MNIST with an input size of 784, hidden layer sizes of [128, 64], and an output size of 10. This model was trained using the Adam optimizer with a learning rate of 0.001. It achieved an accuracy of approximately 93.93% on the test set.
Regarding the KAN model, I used a basic architecture with a single hidden layer of size 128 and an output layer of size 10, similar to the ClassicMLP output layer. I trained the KAN network using the same Adam optimizer and learning rate. It achieved a higher accuracy of around 97.03% on the test set.
Have you tried any optimizations or experimented with different architectures to address the memory issue with your KAN model?
Hello @goknurarican , Can you please share the demo code, especially for KAN model on MNIST. Because the current pykan model can hardly fit in the GPU with large dimensions like 768. I am curious how you did that?
What is the network architecture? [784:2828, 128, 10]? directly flat 2828 to 768? grid and k? k=3? grid?