ollama
ollama copied to clipboard
Add option to ignore 'keep_alive' in request body
When hosting this in prod we would like our users to not be able to unload the model from the GPU. Currently whenever users use the continue extension to communicate with the model they reset the keep_alive to 5 minutes.
It would be nice to have an environment variable like IGNORE_KEEP_ALIVE_REQUESTS=1 so that we can set OLLAMA_KEEP_ALIVE=-1 in the container.