cortex.cpp
cortex.cpp copied to clipboard
idea: Add GPU offloading for larger/MOE models (e.g. mixtral-offloading)
Problem Jan is great, but I'm limited o the number of models I can run on my 16GB GPU. I saw there is a project called mixtral-offloading that could solve my problem.
I realize this isn't your fault, but if there were a way to integrate Jan with other offloading modules, that would be extremely helpful.
Success Criteria The ability to run larger LLMs such as Mixtral 8x7B on a 16GB GPU.
Additional context Pretty self-explanatory. If it can be done, great. If it's too much work, I just need to get a bigger GPU at some point. :)
I think we have no plan for this yet but would be great if it's there. Maybe adding a new inference provider locally would help. I will transfer this issue to nitro instead.