ollama icon indicating copy to clipboard operation
ollama copied to clipboard

How do we stop a model to release GPU memory? (not ollama server).

Open riskk21 opened this issue 2 years ago • 8 comments
trafficstars

How do we stop a model to release GPU memory? (not ollama server).

riskk21 avatar Oct 27 '23 05:10 riskk21

The memory will be release about 5 minutes after the last time you use it

technovangelist avatar Oct 27 '23 05:10 technovangelist

Is there a special command?

riskk21 avatar Oct 27 '23 05:10 riskk21

It's automatic at this time. But we are looking into other options.

technovangelist avatar Oct 27 '23 06:10 technovangelist

@technovangelist can I modify the offloading time somewhere in the code?

erick1337 avatar Dec 04 '23 09:12 erick1337

@technovangelist can I modify the offloading time somewhere in the code?

Figured out:

ollama/server/routes.go var defaultSessionDuration = 5 * time.Minute

erick1337 avatar Dec 04 '23 10:12 erick1337

That’s great to hear. There is an interesting PR using environment variables that may solve this for some folks.

technovangelist avatar Dec 08 '23 23:12 technovangelist

Yeah would be great to have it as an environment variable, especially when using langchain from another host.

sigkill avatar Jan 01 '24 03:01 sigkill

I can confirm that it's annoying to have to wait for the model to reload (it takes a long time for me) when you're waiting for the answer. :) A settings would be fine

skrew avatar Jan 09 '24 13:01 skrew

This can be done with:

curl http://localhost:11434/api/generate -d '{"model": "llama2", "keep_alive": 0}'

To effectively unload a model (assuming llama2 is loaded)

See this doc for more info: https://github.com/ollama/ollama/blob/main/docs/faq.md#how-do-i-keep-a-model-loaded-in-memory-or-make-it-unload-immediately

jmorganca avatar Feb 20 '24 00:02 jmorganca

This can be done with:

curl http://localhost:11434/api/generate -d '{"model": "llama2", "keep_alive": 0}'

To effectively unload a model (assuming llama2 is loaded)

See this doc for more info: https://github.com/ollama/ollama/blob/main/docs/faq.md#how-do-i-keep-a-model-loaded-in-memory-or-make-it-unload-immediately

It is best to use environment variable Settings to change the time that large model services keep alive!!

leoterry-ulrica avatar Mar 13 '24 03:03 leoterry-ulrica

Set enviroment value in

vim /etc/systemd/system/ollama.service

Environment="OLLAMA_KEEP_ALIVE=-1"

systemctl daemon-reload

systemctl restart ollama.service

sistemasici avatar Apr 23 '24 12:04 sistemasici