ramalama icon indicating copy to clipboard operation
ramalama copied to clipboard

Feature Request: Tool calling

Open edmcman opened this issue 10 months ago • 11 comments

It would be nice to support tool calls when serving a model. For llama.cpp, this means passing --jinja and possibly --chat-template-file. These should probably be options in ramalama serve.

edmcman avatar Mar 11 '25 19:03 edmcman

This may have been added in https://github.com/containers/ramalama/pull/952?

edmcman avatar Apr 02 '25 14:04 edmcman

How does one pass a custom chat template file to ramalama serve? This is often needed, unfortunately. See here

edmcman avatar Apr 02 '25 14:04 edmcman

It doesn't seem to be working even when the chat template in the GGUF is correct.

I ran: ramalama serve file:///home/ed/.cache/llama.cpp/bartowski_Qwen2.5-7B-Instruct-GGUF_Qwen2.5-7B-Instruct-Q4_K_M.gguf

And then the following command from the llama.cpp docs:

curl http://localhost:8080/v1/chat/completions -d '{
"model": "gpt-3.5-turbo",
"tools": [
    {
    "type":"function",
    "function":{
        "name":"python",
        "description":"Runs code in an ipython interpreter and returns the result of the execution after 60 seconds.",
        "parameters":{
        "type":"object",
        "properties":{
            "code":{
            "type":"string",
            "description":"The code to run in the ipython interpreter."
            }
        },
        "required":["code"]
        }
    }
    }
],
"messages": [
    {
    "role": "user",
    "content": "Print a hello world message with python."
    }
]
}'
{"error":{"code":500,"message":"tools param requires --jinja flag","type":"server_error"}}

I think we may be talking past each other with the phrase "tool calling". I mean the "tools" field in the OpenAI API.

edmcman avatar Apr 02 '25 15:04 edmcman

Try "--runtime-args" "--jinja"

ericcurtin avatar Apr 02 '25 16:04 ericcurtin

Sure, that will work. That is why I added --runtime-args in the first place.

Beyond that, though, I would argue this should be an abstraction in ramalama. I think @engelmi added some support for this, but only with --use-model-store? And I can't see how to change the chat template manually.

edmcman avatar Apr 02 '25 16:04 edmcman

I'm gonna open a PR to enable --jinja everywhere, see if anything starts to fail... If things start to fail, I really thinks it's an issue that needs to be solved in llama.cpp land with fallback mechanisms, rather than here

ericcurtin avatar Apr 02 '25 16:04 ericcurtin

This is the wrong space to discuss these things in general, this is more on the llama.cpp side to address.

ericcurtin avatar Apr 02 '25 16:04 ericcurtin

@edmcman You can use the --chat-template-file option for ramalama [run|serve] to specify a custom (jinja) template. If you use the --use-model-store option, then ramalama will try to use the metadata from the model file (if its gguf) or try to determine if a template is specified in the model source (e.g. I think the config.json contains such information if present).

engelmi avatar Apr 12 '25 11:04 engelmi

@edmcman You can use the --chat-template-file option for ramalama [run|serve] to specify a custom (jinja) template. If you use the --use-model-store option, then ramalama will try to use the metadata from the model file (if its gguf) or try to determine if a template is specified in the model source (e.g. I think the config.json contains such information if present).

Yeah, part of the reason why I opened this issue was because a new user is not going to know about model store. I'm still a bit confused why it's not enabled by default.

edmcman avatar Apr 12 '25 13:04 edmcman

Yeah, part of the reason why I opened this issue was because a new user is not going to know about model store. I'm still a bit confused why it's not enabled by default.

Yes, you are right. It was, and still is, a breaking change so it was hidden behind that feature flag. My plan was writing a migration moving from the old to the new seamlessly and pruning the storage stuff, but don't have the time to do so, unfortunately.

engelmi avatar Apr 12 '25 17:04 engelmi

We need to convert to model-store by default soon, next week is my plan. I think going forward we will do potentially breaking changes on odd releases, with hopefully coversion tools. We are moving Fedora to an only Even release schedule. As we get closer to a 1.0 release, we can slow down the release schedule.

rhatdan avatar Apr 13 '25 10:04 rhatdan

W have gone through the model-store conversion, what is next for tool-calling?

rhatdan avatar Jul 22 '25 13:07 rhatdan

I think we are passing --jinja for llama.cpp by default for quite a while now. And with #1732 we are passing the custom template of the model again. Not sure if this already satisfies the tool-calling. WDYT? @ericcurtin

engelmi avatar Jul 23 '25 09:07 engelmi

I agree we can close this @engelmi llama.cpp now has tool calling with jinja, I don't think it's 100% complete but a lot of people are using the bits available in llama.cpp with great success.

If we want to enhance tool calling even more, new issues should probably be opened against upstream llama.cpp, then downstream RamaLama here should upgrade.

ericcurtin avatar Jul 25 '25 15:07 ericcurtin