private-gpt
private-gpt copied to clipboard
When using Ollama as the engine for LLM, restart the llama model every time?
If you are using Ollama alone, Ollama will load the model into the GPU, and you don't have to restart loading the model every time you call Ollama's api. But in privategpt, the model has to be reloaded every time a question is asked, which greatly increases the Q&A time.
I think this solves your problem https://github.com/zylon-ai/private-gpt/pull/1800 the default is 5m. Increase it.
ollama:
keep_alive: 30m