llama.cpp icon indicating copy to clipboard operation
llama.cpp copied to clipboard

Feature Request:

Open linuxmagic-mp opened this issue 1 month ago • 13 comments

Prerequisites

  • [x] I am running the latest code. Mention the version if possible as well.
  • [x] I carefully followed the README.md.
  • [x] I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
  • [x] I reviewed the Discussions, and have a new and useful enhancement to share.

Feature Description

Background: Problems using Mistral Nemo Instruct 2407 with tools, The root issue is that even though the model is tools aware, it expects tools to be provided in a non-aicompatible format, eg wrapped in [AVAIALBLE_TOOLS] however, and thus the use of the orchastori and .jinja templates, to rewite the oaicompatbile to mistral compatible. However, this creates an issue. If you decide to define the tools in the .jinja template, that rewrites the prompt, but the server sees no 'tools' in the original prompt, and doesn't set the logic or params to use tools, and thus the task will not examine the result from the LLM for tool calls. (Ashamed to admit, this took me a lot longer to shake out than it should.) Technically, of course the server should see the LLM as tool aware, from the GGUF itslef, and pull out the 'Mistral Nemo' name, and set it as tool aware, which doesn't appear to occur. The logic appears flawed... eg {{{ // Plain handler (no tools) if (params.tools.is_null() || inputs.tool_choice == COMMON_CHAT_TOOL_CHOICE_NONE) { if (params.tools.is_null()) { LOG_DBG("MP: Short circuit, doens't reach Nemo, tools is null\n"); } return common_chat_params_init_without_tools(tmpl, params); }

// Mistral Nemo (w/ tools)
if (src.find("[TOOL_CALLS]") != std::string::npos) {
    return common_chat_params_init_mistral_nemo(tmpl, params);
}

// Generic fallback
return common_chat_params_init_generic(tmpl, params);

} }}}

As you can see the mistral parsing never gets reached, because tools.is_null() still, in common_chat_templates_apply_jinja()

Now, before I start looking at a pull request, this appears to be a logic problem, that should be first discussed at the design level, on how to best approach this.

HTTP request ↓ oaicompat_chat_params_parse() ↓ produces JSON "data" orchestrator (Jinja template) runs ONLY to build text prompt ↓ params_from_json_cmpl() copies fields from "data" → task.params ↓ task submitted to inference queue ↓ LLM generates output ↓ common_chat_parse() checks task.params.tools

My 'suggestion' is to audit this, so that the server knows the model is tool_aware, based on the .gguf template. Then, we still have to indicate to the task that there are 'tools' available, and this conclusion cannot be 100% met, until after the orchastrator runs.. so either the orchastrator needs to be responsible for setting the params, or the server needs to recognize that a Mistral style .jinja template was used, and parse the resulting prompt again from [AVAILABLE_TOOLS] block.

I need feedback on how this should be approached..

Motivation

Bug, Jinja templates cannot set parmas for tools.

Possible Implementation

No response

linuxmagic-mp avatar Nov 11 '25 17:11 linuxmagic-mp

Sounds like you're just not using the correct chat template, try the one included in models/templates/mistralai-Mistral-Nemo-Instruct-2407.jinja.

CISC avatar Nov 11 '25 19:11 CISC

No, this isnt' a case of not using the right template, this is a case where tools can not be specified in a mistral .jinja template, and set the server configuration params based on tools set in the template.

linuxmagic-mp avatar Nov 12 '25 15:11 linuxmagic-mp

I'm not sure what you mean by mistral .jinja template, the template I mentioned has tools.

CISC avatar Nov 12 '25 17:11 CISC

Shipped with llama.cpp, ./models/templates/mistralai-Mistral-Nemo-Instruct-2407.jinja, this does NOT have tools, but assumes tools are already present in the prompt that the orchastrator rewrites to pass to the Mistral LLM (eg in [AVAILABLE_TOOLS]) I guess, given tthat the logic will NOT set params.tools.* in the task, so that the template that parses the LLM tool_calls response is not called. In 'my' editted version of ./models/templates/mistralai-Mistral-Nemo-Instruct-2407.jinja, i set tools in the template, and as mentioned.. if (params.tools.is_null()) { LOG_DBG("MP: Short circuit, doens't reach Nemo, tools is null\n"); }, failing to reach // Mistral Nemo (w/ tools) in chat.cpp

linuxmagic-mp avatar Nov 12 '25 19:11 linuxmagic-mp

This is where tools are added provided you pass tools, this is how all templates work: https://github.com/ggml-org/llama.cpp/blob/92bb442ad999a0d52df0af2730cd861012e8ac5c/models/templates/mistralai-Mistral-Nemo-Instruct-2407.jinja#L27-L50

CISC avatar Nov 12 '25 20:11 CISC

I completely understand that.. FYI, I have spent over 30 hours carefully going over the underlying logic before I posted this as an issue. What you are describing is a case where the original prompt, eg from llama-server interface, has as part of the prompt, an exisitng tools.. eg the 'if tools is not none'.. however, that can be set either by the server, before where it calls the orchastrator (eg it found oaicompatible tool definitions in the original prompt), so it can call the appropriate jinja template handler.. OR tools can be explicitly specified in the .jinja template. In the later case, params.tools will never be populated.. For clarity.. the actual prompt sent to the LLM in my case example is [INST]system prompt[/INST][AVAILABLE_TOOLS][ {"type": "function", "function": {"name": "brave_mcp", "description": "Perform a Brave MCP search and return summarized results.", "parameters": {"type": "object", "properties": {"query": {"type": "string", "description": "Search query to send to Brave MCP"}, "num_results": {"type": "integer", "description": "Number of results to return"}}, "required": ["query"]}, "usage": "Use this tool whenever the user asks to search the web, or if you need current information only available by searching the web. Respond with elements of the search result in your answer.", "script": "/models/agents/tools/brave_mcp.py", "id": "TOOL-9566C98B"}}, {"type": "function", "function": {"name": "get_time", "description": "Return the current time in a specified timezone. Example usage: {'timezone': 'UTC'}", "parameters": {"type": "object", "properties": {"timezone": {"type": "string"}}, "required": []}, "usage": "Use this tool whenever the user asks what time it is. Respond with the time result in your answer.", "script": "/models/agents/tools/get_time.py", "id": "TOOL-9B81AB56"}}, {"type": "function", "function": {"name": "get_weather", "description": "Return current weather for a given city. Example usage: {'city': 'Vancouver'}", "parameters": {"type": "object", "properties": {"city": {"type": "string"}}, "required": ["city"]}, "usage": "Use this tool whenever the user asks about current weather in a city. Respond with the weather result in your answer.", "script": "/models/agents/tools/get_weather.py", "id": "TOOL-66809F3B"}}, {"type": "function", "function": {"name": "whoami", "description": "Reports the effective system user the LLM is running as.", "parameters": {"type": "object", "properties": {"command": {"type": "string", "default": "whoami"}, "args": {"type": "array", "items": {"type": "string"}, "default": []}}, "required": []}, "usage": "Use this tool anyone asks about what operating system the LLM is running as", "script": "/models/agents/tools/whoami.py", "id": "TOOL-933E3661"}} ][/AVAILABLE_TOOLS] [INST]what time is it?[/INST] However, debug logs shows params never gets populated with that information for the task.

linuxmagic-mp avatar Nov 12 '25 21:11 linuxmagic-mp

So, if I understand correctly what you want is to enable tool handling even when tools is not provided?

CISC avatar Nov 12 '25 21:11 CISC

Well, this 'should' be enshrined in the use of .jinja templates of course, for many reasons. If 'tools' is provided in the template, that should be treated the same as if any other prompt, where tools are embedded before reaching the server. My communication style must be poor, as I thought that was evident in the first post. The server ONLY sets tool handling basd on the original oaicompat prompt (eg from the gui), when it should know that Mistral (from it's .gguf template/headers) that it is 'capable' of tool handling, but even that can be a weak assumption of a LLM, given a lora training layer might give the LLM tool awareness. The only 'safe' way, is by examing the final prompt for the LLM, to see if tools are present in the native format for the specific LLM. And with the ability to have custom templates, eg in my case to automatically generate new .jinja files as new tools become available. There already is code in the llama.cpp to examine a template for [AVAILABLE_TOOLS], but that doesn't trigger. The .jinja template is correctly 'loaded' by the server, on user input ONLY to change the user prompt to native LLM style/format, but the task is started independently with no knowledge of the tools, and as the server/LLM accept the new user prompt, it's either too late in the logic to update the task params, or there is no mechanishm to get tools from the resulting final LLM prompt. It should know by the usage of the template, that the user call format is of type 'Mistral Nemo'

linuxmagic-mp avatar Nov 12 '25 22:11 linuxmagic-mp

So, if I understand correctly what you want is to enable tool handling even when tools is not provided?

I think he wants to enable tool handling when tools are provided (hardcoded) within the Jinja template.

pwilkin avatar Nov 12 '25 23:11 pwilkin

So basically, the feature request, if I understand it correctly, is "properly handle cases where tool definitions are hardcoded in the template instead of passed via the tools parameter at runtime".

pwilkin avatar Nov 12 '25 23:11 pwilkin

Close, and effectively yes ;) But I think this is a logic problem, so trying to figure out the 'intent' of the designers of llama-server, on how this should be addressed. Is .jinja templates a long term direction for handling completion requests? Why is mistral code block expliclty AFTER the test for tools, rather than some of the other cases? Should the template handler (orchastrator) modify the task params directly? Or should this be post orchastrator examination directly on the prompt as it is fed into the LLM task?

linuxmagic-mp avatar Nov 12 '25 23:11 linuxmagic-mp

Why is mistral code block expliclty AFTER the test for tools, rather than some of the other cases?

I'm pretty sure this was done on purpose at some point, can't find exactly when and why right now.

CISC avatar Nov 13 '25 09:11 CISC

This is why I am looking to guidance from the llama team. I have the option of patching in several ways, and could make a pull request, but need to understand the 'plan' behind the design.. in order to place the logic in the correct part of the flow.

linuxmagic-mp avatar Nov 13 '25 16:11 linuxmagic-mp