JimiVex comments

Repositories
Issues
Comments

Results 2 comments of


                                            JimiVex

Integrate with a locally hosted LLM instead of using API

Just to say, you can get 4000 tokens worth of context length when running models through exllama. I've been doing that with Chronos 30b model, with exllama in tow, with...

Integrate with a locally hosted LLM instead of using API

Here's a Reddit post chatting on the larger context length utilizing exllama: https://www.reddit.com/r/LocalLLaMA/comments/14j4l7h/6000_tokens_context_with_exllama/ The repo itself: https://github.com/turboderp/exllama