mistral.rs Is it possible to add support for Infini-attention?

Is it possible to add support for Infini-attention?

Open sdmorrey opened this issue 9 months ago • 2 comments

There's some work being done to implement Infini-attention from https://arxiv.org/pdf/2404.07143

In a nutshell it allows for essentially an unlimited context length without incurring the quadratic penalty. There's a proof of concept with 10M token context running in less than 32GB of RAM here... https://github.com/mustafaaljadery/gemma-2B-10M

I believe we will see more models adopting this approach and if this were officially supported it would be a huge benefit to the community.

I don't have the rust chops to pull this off, but I thought I'd at least bring it to your attention since you have Phi-3 with 128k context working already.

Thanks for all your hard work!

May 11 '24 06:05 sdmorrey

mistral.rs mistral.rs copied to clipboard

Is it possible to add support for Infini-attention?

mistral.rs
mistral.rs copied to clipboard