cortex.cpp idea: Add GPU offloading for larger/MOE models (e.g. mixtral-offloading)

idea: Add GPU offloading for larger/MOE models (e.g. mixtral-offloading)

Open poldon opened this issue 1 year ago • 1 comments

Problem Jan is great, but I'm limited o the number of models I can run on my 16GB GPU. I saw there is a project called mixtral-offloading that could solve my problem.

I realize this isn't your fault, but if there were a way to integrate Jan with other offloading modules, that would be extremely helpful.

Success Criteria The ability to run larger LLMs such as Mixtral 8x7B on a 16GB GPU.

Additional context Pretty self-explanatory. If it can be done, great. If it's too much work, I just need to get a bigger GPU at some point. :)

Jan 17 '24 18:01 poldon

I think we have no plan for this yet but would be great if it's there. Maybe adding a new inference provider locally would help. I will transfer this issue to nitro instead.

Mar 08 '24 04:03 hiro-v

cortex.cpp cortex.cpp copied to clipboard

idea: Add GPU offloading for larger/MOE models (e.g. mixtral-offloading)

cortex.cpp
cortex.cpp copied to clipboard