idea: Support running multiple models simultaneously in Jan
Problem Statement
Currently, Jan (like LM Studio) appears unable to serve or manage multiple models at the same time. This limits flexibility for users who want to switch between or parallelize different models (e.g., Jan Nano and a larger remote model).
Feature Idea
Allow Jan to load and serve multiple models simultaneously, either via tabbed sessions or configurable contexts. This would enable more dynamic workflows, like using Jan Nano for quick queries and a larger model for deeper tasks, without restarting or reconfiguring Jan.
User comment: https://www.reddit.com/r/LocalLLaMA/comments/1lf5yog/comment/mylorfy/
This is probably because realistically, only a small subset of user would benefit from being able to run multiple models at once due to VRAM restriction on most consumer level GPU.
If we do loading and unloading of model in every switch, it will also won't provide a very good user experience, because model takes a few second to be load in and out of memory.
cc @gau-nernst @qnixsynapse for opinion regarding this
The new llama.cpp backend supports running multiple models at the same time, given enough VRAM is available(disabled by default).
@eckartal closing this as done, if you want to follow up with user