private-gpt When using Ollama as the engine for LLM, restart the llama model every time?

When using Ollama as the engine for LLM, restart the llama model every time?

Open 17Reset opened this issue 1 year ago • 1 comments

If you are using Ollama alone, Ollama will load the model into the GPU, and you don't have to restart loading the model every time you call Ollama's api. But in privategpt, the model has to be reloaded every time a question is asked, which greatly increases the Q&A time.

Mar 28 '24 06:03 17Reset

I think this solves your problem https://github.com/zylon-ai/private-gpt/pull/1800 the default is 5m. Increase it.

ollama:
  keep_alive: 30m

Apr 06 '24 20:04 dbzoo

private-gpt private-gpt copied to clipboard

When using Ollama as the engine for LLM, restart the llama model every time?

private-gpt
private-gpt copied to clipboard