llamafile
llamafile copied to clipboard
is the --grammar documented anywhere?
What are possible values? is there a way to force the content response as JSON? Like with ollama.ai?
Had a quick look at the llamafile codebase. I think you are looking at the llama.cpp command. As the readme states
"Command-line llamafiles" run entirely inside your terminal and operate just like llama.cpp's "main" function. This means you have to provide some command-line parameters, just like with llama.cpp.
In this particular context, I would recommend reading https://github.com/ggerganov/llama.cpp/tree/master/grammars for further details.
On further research, looks like the llama.cpp team figured out in this commit https://github.com/ggerganov/llama.cpp/pull/1887 a way to import a json schema to BNF form that can be accepted by --grammar.
This was one of their example in the commit discussion shown above.
# LLAMA.CPP
./main -m $LLAMA2_13B_Q4_0 --grammar "$( python3 examples/json-schema-to-grammar.py ../schemas/getting-started-full.json --prop-order 'productName,price,productId,dimensions' )"
# OUTPUT
{"productName":"Blu-ray+DVD: The Good Dinosaur","price":10,"productId":452389,"dimensions":{"height":267,"length":152.4,"width":178},"tags":["Blu-ray","Comedy","Drama","Kids \u0026 Family","Sci-Fi \u0026 Fantasy"]} [end of text]
Note that in --grammar it executes in bash completion $() this script https://github.com/ggerganov/llama.cpp/blob/master/examples/json-schema-to-grammar.py where it has these inputs
python3 examples/json-schema-to-grammar.py \
../schemas/getting-started-full.json \
--prop-order 'productName,price,productId,dimensions'
Hope that gives some possible pointers. But this would be good to note in the application guide.
Is it possible to do it in node node/javascript and not python?
Unsure, but it looks to be a common topic. Have a look at https://withcatai.github.io/node-llama-cpp/guide/grammar which I found using this search term json-schema-to-grammar llama.cpp node where they have a github repo on this
- https://github.com/withcatai/node-llama-cpp
- Run AI models locally on your machine with node.js bindings for llama.cpp. Force a JSON schema on the model output on the generation level
Other approach could be to rewrite examples/json-schema-to-grammar.py into node.js script
Hope that answers your question and if you feel like it helped, then you may want to add your insight to some form of application tips doc in this repo
Contributions are welcome on documentation that explains how to use backus-naur form. The llama.cpp grammar system is so intuitive to me that I feel it needs little explanation. How would you document it in detail? You'd have to use backus-naur form.
Everything being built on top of the grammar though is an ongoing area of research. Things like restricting it to valid C++ code, or a JSON object list that specifically matches the layout of your SQLite database. I'm skeptical it's even possible, and if it were possible using grammar alone given the state of today's technology, and if one area did work really well, I'd want the tool to be written in C/C++ so it could be part of the llamafile executable.
Just out of curiosity, does anyone have experience with how "scalable" the llama.cpp grammar implementation is? E.g. would it be feasible to push some ~100K strings into a grammar (say, a listing of product descriptions), either as a flat disjunction of clauses, or encoded as a trie? You would probably have to mitigate the greedy decoding with beam search or sth, but something about the idea of guiding/constraining the model output this way seems very interesting.
Anyways, thanks for this amazing work!
I was wondering something similar. I'm tring to use Llamafile for a smart home voice controller, and was wondering if I could 'nudge' an assistant towards the names devices and their properties. E.g. a user is likely to ask about the brightness of the livingroom light.
Just out of curiosity, does anyone have experience with how "scalable" the llama.cpp grammar implementation is? E.g. would it be feasible to push some ~100K strings into a grammar (say, a listing of product descriptions), either as a flat disjunction of clauses, or encoded as a trie? You would probably have to mitigate the greedy decoding with beam search or sth, but something about the idea of guiding/constraining the model output this way seems very interesting.
Anyways, thanks for this amazing work!
+1 also looking for scalable // production-kind-of-ready
@trpstra did you get any further with this?