ollama
ollama copied to clipboard
Richer grammars
What are you trying to do?
Being able to specify grammars is great, but it seems a bit underutilized at the moment. This is mostly a thought dump on how it could be improved from experimentation...
How should we solve this?
- Using llama.cpp grammar directly would be pretty powerful and nice to have
- Specifying jsonschema for json. Llama.cpp json is usually forced into a specific key order and ollama json isn't schema'd at all
- Changing the format on the fly is useful, but I think it would be nice to have a way to specify a grammar in the Modelfile.
What is the impact of not solving this?
Not having either of the first two ideas is annoying, since there's no way to guarantee that a model generates a response in a format I want. The third idea allows one to make a "llm api", where a model generates a specific response every time (imagine bundling a "summary llm" that always responds with {"summary":"..."} as a Modelfile.)
Anything else?
No response
There are a few open PRs for this behaviour - the most recent one being https://github.com/ollama/ollama/pull/3618 it would be amazing to get this merged in. It's a 2 line change that exposes the llama.cpp GBNF functionality via modelfile parameters. Its not my patch but I've compiled it and used it locally and it works really well.
Specifying grammars in the Modelfile is one thing. It would be much more useful to be able to send a grammar string in the request, similar to llama.cpp server. Is that possible with Ollama now? There's nothing about grammars in the api docs.
Yes you can send the grammar as an option when you submit a request with the patch I linked to above enabled. It just isn't documented!
Here's an example:
POST http://localhost:11434/api/chat
{
"model":"llama3:8b",
"stream": false,
"messages":[
{"role":"user", "content": "The sky is blue, true or false?"}
],
"options":{
"grammar": "root ::= (\"true\" | \"false\")"
}
}
Response:
{
"model": "llama3:8b",
"created_at": "2024-04-21T20:53:15.212659393Z",
"message": {
"role": "assistant",
"content": "true"
},
"done": true,
"total_duration": 545867966,
"load_duration": 5912270,
"prompt_eval_duration": 213384000,
"eval_count": 2,
"eval_duration": 202538000
}
It just isn't documented!
So, I can add docs about it in #3618.
UPD: done.
That's really really cool. Though as far as i can see it's not merged into main yet.
This would be so nice to have, one step closer to ditching my custom python bindings for llamacpp and being able to use ollama now that they also started supporting concurrent models. Sending a grammar together with the request is a great feature to support.
I added support for this and JSON schema in #5348.
Hey all, I know there's an automated ping here but just to better align everyone please check out and comment on my new call to the Ollama team for clarity here. As always please be civil and stay on topic! 😄 - https://github.com/ollama/ollama/issues/6237