Feature Request: Tool calling
It would be nice to support tool calls when serving a model. For llama.cpp, this means passing --jinja and possibly --chat-template-file. These should probably be options in ramalama serve.
This may have been added in https://github.com/containers/ramalama/pull/952?
How does one pass a custom chat template file to ramalama serve? This is often needed, unfortunately. See here
It doesn't seem to be working even when the chat template in the GGUF is correct.
I ran: ramalama serve file:///home/ed/.cache/llama.cpp/bartowski_Qwen2.5-7B-Instruct-GGUF_Qwen2.5-7B-Instruct-Q4_K_M.gguf
And then the following command from the llama.cpp docs:
curl http://localhost:8080/v1/chat/completions -d '{
"model": "gpt-3.5-turbo",
"tools": [
{
"type":"function",
"function":{
"name":"python",
"description":"Runs code in an ipython interpreter and returns the result of the execution after 60 seconds.",
"parameters":{
"type":"object",
"properties":{
"code":{
"type":"string",
"description":"The code to run in the ipython interpreter."
}
},
"required":["code"]
}
}
}
],
"messages": [
{
"role": "user",
"content": "Print a hello world message with python."
}
]
}'
{"error":{"code":500,"message":"tools param requires --jinja flag","type":"server_error"}}
I think we may be talking past each other with the phrase "tool calling". I mean the "tools" field in the OpenAI API.
Try "--runtime-args" "--jinja"
Sure, that will work. That is why I added --runtime-args in the first place.
Beyond that, though, I would argue this should be an abstraction in ramalama. I think @engelmi added some support for this, but only with --use-model-store? And I can't see how to change the chat template manually.
I'm gonna open a PR to enable --jinja everywhere, see if anything starts to fail... If things start to fail, I really thinks it's an issue that needs to be solved in llama.cpp land with fallback mechanisms, rather than here
This is the wrong space to discuss these things in general, this is more on the llama.cpp side to address.
@edmcman You can use the --chat-template-file option for ramalama [run|serve] to specify a custom (jinja) template. If you use the --use-model-store option, then ramalama will try to use the metadata from the model file (if its gguf) or try to determine if a template is specified in the model source (e.g. I think the config.json contains such information if present).
@edmcman You can use the
--chat-template-fileoption forramalama [run|serve]to specify a custom (jinja) template. If you use the--use-model-storeoption, then ramalama will try to use the metadata from the model file (if its gguf) or try to determine if a template is specified in the model source (e.g. I think theconfig.jsoncontains such information if present).
Yeah, part of the reason why I opened this issue was because a new user is not going to know about model store. I'm still a bit confused why it's not enabled by default.
Yeah, part of the reason why I opened this issue was because a new user is not going to know about model store. I'm still a bit confused why it's not enabled by default.
Yes, you are right. It was, and still is, a breaking change so it was hidden behind that feature flag. My plan was writing a migration moving from the old to the new seamlessly and pruning the storage stuff, but don't have the time to do so, unfortunately.
We need to convert to model-store by default soon, next week is my plan. I think going forward we will do potentially breaking changes on odd releases, with hopefully coversion tools. We are moving Fedora to an only Even release schedule. As we get closer to a 1.0 release, we can slow down the release schedule.
W have gone through the model-store conversion, what is next for tool-calling?
I think we are passing --jinja for llama.cpp by default for quite a while now. And with #1732 we are passing the custom template of the model again. Not sure if this already satisfies the tool-calling. WDYT? @ericcurtin
I agree we can close this @engelmi llama.cpp now has tool calling with jinja, I don't think it's 100% complete but a lot of people are using the bits available in llama.cpp with great success.
If we want to enhance tool calling even more, new issues should probably be opened against upstream llama.cpp, then downstream RamaLama here should upgrade.