How can I load two or more models simultaneously?
The future of AI is on the modern edge/endpoint, and I really like the Foundry Local strategy! In the future, do we have plans to support running multiple models simultaneously? I really hope to see it realized as soon as possible. Since Foundry Local provides an SDK, it makes it easy to develop the future of AI based on it.
There's no restriction on loading models simultaneously, however models are quite large so memory constraints will be the biggest factor as to what's possible.
Thank you for reply. I have a scenario design using Phi4-mini with bge-m3 to implement RAG locally, but there are no relevant guidance directions.
Hi @goxia, as @skottmckay says you can load as many models as your device supports.
If you are implementing a RAG scenario, Foundry Local does not currently support embedding models but will very soon.
Please let us know whether you after guidance on loading multiple models or RAG specifically
If Foundry Local can implement RAG, it will become much easier to realize application scenarios. I am very interested in how to implement RAG using Foundry Local. Thank you for your attention and help.