Ollama hangs on `Resampling because token 17158: '<token>' does not meet grammar rules`
Situation: I am having ollama get stuck in an infinite loop on ubuntu 22.04 with certain requests. It appears to die, with broken pipes not breaking out and I have to restart the service. When I say "die" I mean no further requests are handled. As the log at INFO level only logs when the request has been sent back, nothing is logged in this scenario.
My approach to solving it:
set OLLAMA_DEBUG=1 and look at the journalctl logs. I've set it in two places:
environment variable:
export OLLAMA_DEBUG=1
set | grep OLLAMA
OLLAMA_DEBUG=1
And in the [Service] of ollama.service
[Unit]
Description=Ollama Service
After=network-online.target
[Service]
ExecStart=/usr/local/bin/ollama serve
User=ollama
Group=ollama
Restart=always
RestartSec=3
Environment="PATH=/home/…<various paths>…:/snap/bin OLLAMA_DEBUG=1"
[Install]
WantedBy=default.target
Then I restarted the server successfully.
sudo systemctl daemon-reload
sudo systemctl restart ollama.service
Expected output: all the slog.Debug and greater requests logged Observed: only INFO seem to be logged. But the GPU is busy so it's doing SOMETHING.
Anyone know how I can confirm that the debug flag is set correctly?
Or more to the point, anyone know how I can better diagnose the server's infinite loop? It only happens with a particular model, so maybe the GGUF config isn't quite right? It's calebfahlgren/natural-functions:latest
Does the loop respect systemd's RestartSec=3 setting?
You could diagnose by changing the ollama.service file and setting ExecStart=ollama serve to run a wrapper script instead, for example to hold the process running and/or dump its envvars.
To see a running processes' environment and check for debug flags, just read it from procfs:
cat /proc/$PID/environ | tr '\0' '\n' | less
Edit: rather than spending time on the inconveniences and overheads of systemd, you could kill the service and just run sudo -u ollama /usr/local/bin/ollama serve directly, then monitor the log output as you run your model in a separate terminal window.
Ah that's great running it directly both a) set the environment variable properly and b) I can now see level=DEBUG in the logs. I guess I'm not clear how to alter the ollama.service to set the environment variable properly.
It's very interesting it seems like instructor is generating prompts that even mistral:7b can't cope with, but more interestingly indeed in a way that causes ollama to barf. I get ollama to get stuck here, not returning at all.
time=2024-02-23T15:58:59.224Z level=DEBUG source=routes.go:1225 msg="chat handler" prompt="[INST] \n As a genius expert, your task is to understand the content and provide\n the parsed objects in json that match the following json_schema:\n\n {'messages': {'items': {'$ref': '#/$defs/MessagePair'}, 'title': 'Messages', 'type': 'array'}}\n \nHere are some more definitions to adhere too:\n{'MessagePair': {'properties': {'respectful': {'title': 'Respectful', 'type': 'string'}, 'nondisrespectful': {'title': 'Nondisrespectful', 'type': 'string'}}, 'required': ['respectful', 'nondisrespectful'], 'title': 'MessagePair', 'type': 'object'}}\n\n\n As a genius expert, your task is to understand the content and provide\n the parsed objects in json that match the following json_schema:\n\n {'messages': {'items': {'$ref': '#/$defs/MessagePair'}, 'title': 'Messages', 'type': 'array'}}\n \nHere are some more definitions to adhere too:\n{'MessagePair': {'properties': {'respectful': {'title': 'Respectful', 'type': 'string'}, 'nondisrespectful': {'title': 'Nondisrespectful', 'type': 'string'}}, 'required': ['respectful', 'nondisrespectful'], 'title': 'MessagePair', 'type': 'object'}} Generate 5 pairs of short instant messages, where each pair contains a non-disrespectful (respectful or neutral) message and a corresponding disrespectful message exemplifying 'Dishonesty'. [/INST]" images=0
[1708703939] slot 0 is processing [task id: 0]
[1708703939] slot 0 : in cache: 0 tokens | to process: 370 tokens
[1708703939] slot 0 : kv cache rm - [0, end)
[1708703939] Resampling because token 17158: ' Based' does not meet grammar rules
[1708703941] Resampling because token 12069: 'Please' does not meet grammar rules
[1708703941] Resampling because token 12069: 'Please' does not meet grammar rules
[1708703941] Resampling because token 12069: 'Please' does not meet grammar rules
[1708703941] Resampling because token 12069: 'Please' does not meet grammar rules
[1708703941] Resampling because token 12069: 'Please' does not meet grammar rules
[1708703951] slot 0: context shift - n_keep = 0, n_left = 2046, n_discard = 1023
[1708703959] slot 0: context shift - n_keep = 0, n_left = 2046, n_discard = 1023
[1708703967] slot 0: context shift - n_keep = 0, n_left = 2046, n_discard = 1023
[1708703974] slot 0: context shift - n_keep = 0, n_left = 2046, n_discard = 1023
[1708703982] slot 0: context shift - n_keep = 0, n_left = 2046, n_discard = 1023
It just does this until I kill it, blocking the thread and the socket.
Hi @boxabirds, are you using JSON mode by chance? Sorry you hit this
Environment="PATH=/home/…
…:/snap/bin OLLAMA_DEBUG=1"
OLLAMA_DEBUG needs to be on its own Environment line
Environment="PATH=/home/…<various paths>…:/snap/bin"
Environment="OLLAMA_DEBUG=1"
@boxabirds if you're still having problems, please share a bit more information about how your calling Ollama so we can try to reproduce and understand the problem.