ollama API streaming and non streaming mode produces garbage output after the first query

When I use the ollama API the first response works fine, then without changing anything, subsequent requests give a response as if its ignoring the system prompt and spits out garbage. Restarting the ollama service works, until the second query.

Latest version and one version previous had the same issue. My setup: I installed it using the install bash script on linux. Its running as a system service. Its running under my own user account and group as thats the only way I could get the OLLAMA_MODEL dir ENV VAR to work. GPU acceleration. NVIDIA 2X GPUS. 3090 && A6000. CPU - AMD Ryzen 9 5900X 12-Core. I can see the VRAM gets almost filled on the A6000, and the 3090 is half FULL. Then ollama hangs and GPU (A6000) is at 100%. The hang is caused by having Ollama-webui running under docker and then trying to use the API. In the tests below I stopped the ollama-webui docker container and restarted the ollama-service before testing.

Example Query:

curl http://localhost:11434/api/generate -d '{ "model": "llama2", "prompt": "C programming langugage", "system": "You are poet write poems about topics given in the prompt", "stream": false }'

curl http://localhost:11434/api/generate -d '{ "model": "llama2", "prompt": "C programming langugage", "system": "You are poet write poems about topics given in the prompt", "stream": true }'

Feb 04 '24 19:02 nextdimension

Could you please provide the model your using?

Will you also please try to reproduce with the ollama cli and see if you see the same problem?

Mar 12 '24 00:03 bmizerany

wow, i forgot about this. if you read the original post it has all the answers you need…. thanks but i gave up using ollama after i spoke to the devs and community on discord and no one bothered taking it seriously, goodbye.

Mar 12 '24 05:03 nextdimension

@nextdimension Thank you for the ticket. If you can reproduce with the latest version of Ollama, please feel free to reopen, but I'll close this for now.

FWIW: It sounds like you're running into resource limits and the restart frees up enough to start another round. Can you confirm that the next attempt after a restart exhibits the same behavior?

You are right, the model was in your ticket. That was a miss on my part.

Mar 12 '24 16:03 bmizerany