ollama
ollama copied to clipboard
How do we stop a model to release GPU memory? (not ollama server).
How do we stop a model to release GPU memory? (not ollama server).
The memory will be release about 5 minutes after the last time you use it
Is there a special command?
It's automatic at this time. But we are looking into other options.
@technovangelist can I modify the offloading time somewhere in the code?
@technovangelist can I modify the offloading time somewhere in the code?
Figured out:
ollama/server/routes.go var defaultSessionDuration = 5 * time.Minute
That’s great to hear. There is an interesting PR using environment variables that may solve this for some folks.
Yeah would be great to have it as an environment variable, especially when using langchain from another host.
I can confirm that it's annoying to have to wait for the model to reload (it takes a long time for me) when you're waiting for the answer. :) A settings would be fine
This can be done with:
curl http://localhost:11434/api/generate -d '{"model": "llama2", "keep_alive": 0}'
To effectively unload a model (assuming llama2 is loaded)
See this doc for more info: https://github.com/ollama/ollama/blob/main/docs/faq.md#how-do-i-keep-a-model-loaded-in-memory-or-make-it-unload-immediately
This can be done with:
curl http://localhost:11434/api/generate -d '{"model": "llama2", "keep_alive": 0}'To effectively unload a model (assuming
llama2is loaded)See this doc for more info: https://github.com/ollama/ollama/blob/main/docs/faq.md#how-do-i-keep-a-model-loaded-in-memory-or-make-it-unload-immediately
It is best to use environment variable Settings to change the time that large model services keep alive!!
Set enviroment value in
vim /etc/systemd/system/ollama.service
Environment="OLLAMA_KEEP_ALIVE=-1"