Xingyi Yang
Xingyi Yang
KAN was strong but faced scalability issues. We tackled this with 3 simple tricks. By combining KAN with Transformers, we've built a much stronger and more scalable model. 💪 📄...
KAN was strong but faced scalability issues. We tackled this with 3 simple tricks. By combining KAN with Transformers, we've built a much stronger and more scalable model. 💪 📄...
We've just released a new paper on integrating Transformers with KAN. Check out the paper and feel free to explore our code repository! 📄 Paper: https://arxiv.org/abs/2409.10594 💻 Code: https://github.com/Adamdad/kat
### Model description The Kolmogorov–Arnold Transformer (KAT) replaces the standard MLP layers in transformers with Kolmogorov-Arnold Network (KAN) layers, improving the model's expressiveness and overall performance. ### Open source status...
KAN was strong but faced scalability issues. We tackled this with 3 simple tricks. By combining KAN with Transformers, we've built a much stronger and more scalable model. 💪 📄...
KAN was strong but faced scalability issues. We tackled this with 3 simple tricks. By combining KAN with Transformers, we've built a much stronger and more scalable model. 💪 📄...