ollama icon indicating copy to clipboard operation
ollama copied to clipboard

Richer grammars

Open tezlm opened this issue 1 year ago • 6 comments

What are you trying to do?

Being able to specify grammars is great, but it seems a bit underutilized at the moment. This is mostly a thought dump on how it could be improved from experimentation...

How should we solve this?

  • Using llama.cpp grammar directly would be pretty powerful and nice to have
  • Specifying jsonschema for json. Llama.cpp json is usually forced into a specific key order and ollama json isn't schema'd at all
  • Changing the format on the fly is useful, but I think it would be nice to have a way to specify a grammar in the Modelfile.

What is the impact of not solving this?

Not having either of the first two ideas is annoying, since there's no way to guarantee that a model generates a response in a format I want. The third idea allows one to make a "llm api", where a model generates a specific response every time (imagine bundling a "summary llm" that always responds with {"summary":"..."} as a Modelfile.)

Anything else?

No response

tezlm avatar Apr 12 '24 18:04 tezlm

There are a few open PRs for this behaviour - the most recent one being https://github.com/ollama/ollama/pull/3618 it would be amazing to get this merged in. It's a 2 line change that exposes the llama.cpp GBNF functionality via modelfile parameters. Its not my patch but I've compiled it and used it locally and it works really well.

ravenscroftj avatar Apr 14 '24 07:04 ravenscroftj

Specifying grammars in the Modelfile is one thing. It would be much more useful to be able to send a grammar string in the request, similar to llama.cpp server. Is that possible with Ollama now? There's nothing about grammars in the api docs.

rhohndorf avatar Apr 21 '24 20:04 rhohndorf

Yes you can send the grammar as an option when you submit a request with the patch I linked to above enabled. It just isn't documented!

Here's an example:

POST http://localhost:11434/api/chat

{
  "model":"llama3:8b",
  "stream": false,
  "messages":[
    {"role":"user", "content": "The sky is blue, true or false?"}
  ],
  "options":{
    "grammar": "root ::= (\"true\" | \"false\")"
  }
}

Response:

{
  "model": "llama3:8b",
  "created_at": "2024-04-21T20:53:15.212659393Z",
  "message": {
    "role": "assistant",
    "content": "true"
  },
  "done": true,
  "total_duration": 545867966,
  "load_duration": 5912270,
  "prompt_eval_duration": 213384000,
  "eval_count": 2,
  "eval_duration": 202538000
}

ravenscroftj avatar Apr 21 '24 20:04 ravenscroftj

It just isn't documented!

So, I can add docs about it in #3618.

UPD: done.

markcda avatar Apr 22 '24 09:04 markcda

That's really really cool. Though as far as i can see it's not merged into main yet.

rhohndorf avatar Apr 25 '24 00:04 rhohndorf

This would be so nice to have, one step closer to ditching my custom python bindings for llamacpp and being able to use ollama now that they also started supporting concurrent models. Sending a grammar together with the request is a great feature to support.

Nidvogr avatar Apr 28 '24 23:04 Nidvogr

I added support for this and JSON schema in #5348.

mitar avatar Jun 27 '24 23:06 mitar

Hey all, I know there's an automated ping here but just to better align everyone please check out and comment on my new call to the Ollama team for clarity here. As always please be civil and stay on topic! 😄 - https://github.com/ollama/ollama/issues/6237

Kinglord avatar Aug 07 '24 16:08 Kinglord