ramalama icon indicating copy to clipboard operation
ramalama copied to clipboard

Better chat template handling - support Jinja

Open bentito opened this issue 10 months ago • 15 comments

I'm not sure I know enough to write this issue properly, but I'm looking for ramalama to not punt when it can't manage to understand the chat template. I'm thinking specifically about templates supporting function calling and tool calling. Also, better debugging with the client hitting the endpoint seems out of sync with the expected template.

bentito avatar Feb 26 '25 16:02 bentito

Any help making this issue more correct taxonomy and function-wise would be helpful.

bentito avatar Feb 26 '25 16:02 bentito

I read this as jinja support basically which we will look into soon. Tagging @engelmi for awareness

ericcurtin avatar Feb 26 '25 16:02 ericcurtin

Yes, I think jinja support. I'll change the issue title

bentito avatar Feb 26 '25 22:02 bentito

Does this mean that ramalama does not support tool calling yet? If so, might be nice to add to the issue title.

edmcman avatar Mar 11 '25 13:03 edmcman

Does this mean that ramalama does not support tool calling yet? If so, might be nice to add to the issue title.

I'm not sure. I can say I have not had much luck so far with tool or function calling with ramalama served models. I will add more data here if I find it.

bentito avatar Mar 11 '25 15:03 bentito

Does this mean that ramalama does not support tool calling yet? If so, might be nice to add to the issue title.

Depends on the tool, some are compatible, some aren't

ericcurtin avatar Mar 11 '25 17:03 ericcurtin

@bentito Do you mean these jinja built-in functions, for example? Could you provide an example template?

In #917 support to use the respective chat template from the model (e.g. extracted from the gguf file) was added to the run command - will be added to the serve cmd as well soon. Not sure, though, if the underlying implementation in llama.cpp handles function/tool calls. (You need to use the --use-model-store option in order to use the chat template support, i.e. ramalama --use-model-store )

engelmi avatar Mar 12 '25 06:03 engelmi

llama.cpp just recently gained support for tool calling

edmcman avatar Mar 12 '25 13:03 edmcman

Tool calling has been available in llama.cpp for a long time via llama-server (which is the backend for "ramalama serve") but not all tools like that API, so it depends.

ericcurtin avatar Mar 12 '25 13:03 ericcurtin

Does this mean that ramalama does not support tool calling yet? If so, might be nice to add to the issue title.

jinja is a template format. What is meant by tool calling here?

ericcurtin avatar Mar 12 '25 13:03 ericcurtin

Tool calling

It was recently (January) added to llama.cpp's server: https://github.com/ggml-org/llama.cpp/pull/9639

The relation to jinja is that the jinja chat templates also specify the syntax in which the model expects to be informed about available tools: https://huggingface.co/docs/transformers/main/en/chat_extras#tools

edmcman avatar Mar 12 '25 14:03 edmcman

@edmcman @ericcurtin any update on this?

rhatdan avatar Apr 02 '25 11:04 rhatdan

Tbh, I think this issue is too generic we have connected > 10 popular tools with ramalama. A generic "tool calling" issue probably doesn't make sense

ericcurtin avatar Apr 02 '25 14:04 ericcurtin

Tbh, I think this issue is too generic we have connected > 10 popular tools with ramalama. A generic "tool calling" issue probably doesn't make sense

I think a generic tool calling issue does make sense, because AFAIK ramalama still does not support tool calling in ramalama serve. I have an issue for that open in #947.

I'm not really sure what this issue is about though. IMHO without an example it is useless.

edmcman avatar Apr 02 '25 14:04 edmcman

Goose, open webui, aider, anythingllm, etc. Have all been tested to work, generic tool calling does work. It's individual tools that may have issues (which sometimes end up in user error solved by documentation).

Closing generic tool calling issue as the OpenAI API pointed at is implemented

ericcurtin avatar Apr 02 '25 14:04 ericcurtin

@engelmi Where do we stand on this?

rhatdan avatar Jul 22 '25 13:07 rhatdan

I think we can close this if this is only about jinja templates and tool calling is asked for in #947. Closing. (@bentito please reopen if I misunderstood this issue)

engelmi avatar Jul 25 '25 10:07 engelmi