How do we stop a model to release GPU memory? (not ollama server).

Open riskk21 opened this issue 2 years ago • 8 comments

trafficstars

How do we stop a model to release GPU memory? (not ollama server).

Oct 27 '23 05:10 riskk21

The memory will be release about 5 minutes after the last time you use it

Oct 27 '23 05:10 technovangelist

Is there a special command?

Oct 27 '23 05:10 riskk21

It's automatic at this time. But we are looking into other options.

Oct 27 '23 06:10 technovangelist

@technovangelist can I modify the offloading time somewhere in the code?

Dec 04 '23 09:12 erick1337

@technovangelist can I modify the offloading time somewhere in the code?

Figured out:

ollama/server/routes.go var defaultSessionDuration = 5 * time.Minute

Dec 04 '23 10:12 erick1337

That’s great to hear. There is an interesting PR using environment variables that may solve this for some folks.

Dec 08 '23 23:12 technovangelist

Yeah would be great to have it as an environment variable, especially when using langchain from another host.

Jan 01 '24 03:01 sigkill

I can confirm that it's annoying to have to wait for the model to reload (it takes a long time for me) when you're waiting for the answer. :) A settings would be fine

Jan 09 '24 13:01 skrew

This can be done with:

curl http://localhost:11434/api/generate -d '{"model": "llama2", "keep_alive": 0}'

To effectively unload a model (assuming llama2 is loaded)

See this doc for more info: https://github.com/ollama/ollama/blob/main/docs/faq.md#how-do-i-keep-a-model-loaded-in-memory-or-make-it-unload-immediately

Feb 20 '24 00:02 jmorganca

This can be done with:
curl http://localhost:11434/api/generate -d '{"model": "llama2", "keep_alive": 0}'
To effectively unload a model (assuming llama2 is loaded)

See this doc for more info: https://github.com/ollama/ollama/blob/main/docs/faq.md#how-do-i-keep-a-model-loaded-in-memory-or-make-it-unload-immediately

It is best to use environment variable Settings to change the time that large model services keep alive！！

Mar 13 '24 03:03 leoterry-ulrica

Set enviroment value in

vim /etc/systemd/system/ollama.service

Environment="OLLAMA_KEEP_ALIVE=-1"

systemctl daemon-reload

systemctl restart ollama.service

Apr 23 '24 12:04 sistemasici

ollama ollama copied to clipboard

How do we stop a model to release GPU memory? (not ollama server).

vim /etc/systemd/system/ollama.service

systemctl daemon-reload

systemctl restart ollama.service

ollama
ollama copied to clipboard