mistral.rs
mistral.rs copied to clipboard
Is it possible to add support for Infini-attention?
There's some work being done to implement Infini-attention from https://arxiv.org/pdf/2404.07143
In a nutshell it allows for essentially an unlimited context length without incurring the quadratic penalty. There's a proof of concept with 10M token context running in less than 32GB of RAM here... https://github.com/mustafaaljadery/gemma-2B-10M
I believe we will see more models adopting this approach and if this were officially supported it would be a huge benefit to the community.
I don't have the rust chops to pull this off, but I thought I'd at least bring it to your attention since you have Phi-3 with 128k context working already.
Thanks for all your hard work!