jan idea: Support running multiple models simultaneously in Jan

Problem Statement

Currently, Jan (like LM Studio) appears unable to serve or manage multiple models at the same time. This limits flexibility for users who want to switch between or parallelize different models (e.g., Jan Nano and a larger remote model).

Feature Idea

Allow Jan to load and serve multiple models simultaneously, either via tabbed sessions or configurable contexts. This would enable more dynamic workflows, like using Jan Nano for quick queries and a larger model for deeper tasks, without restarting or reconfiguring Jan.

User comment: https://www.reddit.com/r/LocalLLaMA/comments/1lf5yog/comment/mylorfy/

Jun 24 '25 14:06 eckartal

This is probably because realistically, only a small subset of user would benefit from being able to run multiple models at once due to VRAM restriction on most consumer level GPU.

If we do loading and unloading of model in every switch, it will also won't provide a very good user experience, because model takes a few second to be load in and out of memory.

cc @gau-nernst @qnixsynapse for opinion regarding this

Jun 25 '25 00:06 LazyYuuki

The new llama.cpp backend supports running multiple models at the same time, given enough VRAM is available(disabled by default).

Jun 25 '25 02:06 qnixsynapse

@eckartal closing this as done, if you want to follow up with user

Jul 02 '25 07:07 freelerobot