private-gpt icon indicating copy to clipboard operation
private-gpt copied to clipboard

When using Ollama as the engine for LLM, restart the llama model every time?

Open 17Reset opened this issue 1 year ago • 1 comments

If you are using Ollama alone, Ollama will load the model into the GPU, and you don't have to restart loading the model every time you call Ollama's api. But in privategpt, the model has to be reloaded every time a question is asked, which greatly increases the Q&A time.

17Reset avatar Mar 28 '24 06:03 17Reset

I think this solves your problem https://github.com/zylon-ai/private-gpt/pull/1800 the default is 5m. Increase it.

ollama:
  keep_alive: 30m

dbzoo avatar Apr 06 '24 20:04 dbzoo