- Purely visual and diagnostic change, no effect on model context, prompt
  construction, or inference behavior

- Captured assistant tool call payloads during streaming and non-streaming
  completions, and persisted them in chat state and storage for downstream use

- Exposed parsed tool call labels beneath the assistant's model info line
  with graceful fallback when parsing fails

- Added tool call badges beneath assistant responses that expose JSON tooltips
  and copy their payloads when clicked, matching the existing model badge styling

- Added a user-facing setting to toggle tool call visibility to the Developer
  settings section directly under the model selector option

Close https://github.com/ggml-org/llama.cpp/issues/16597

Oct 16 '25 17:10 ServeurpersoCom

I have to do a little cleaning, the patch was not merged properly on my side. -> draft

Oct 17 '25 08:10 ServeurpersoCom

This PR is now clean, but it was developed after this one: https://github.com/ggml-org/llama.cpp/pull/16562

Oct 18 '25 21:10 ServeurpersoCom

Alright, @ServeurpersoCom, let's move forward with this one after merging #16562 ;) Let me know when you've addressed the merge conflicts and I'll gladly review the code

Oct 22 '25 16:10 allozaur

For the tool call inspector, do you prefer having one spoiler block per tool call, or a single aggregated spoiler wrapping all tool calls in the message?

It's rebased/reworked now. I push --force :)

Oct 22 '25 16:10 ServeurpersoCom

Feel free to dissect the architecture as deep as you want! Component boundaries, store coupling, service layering, anything that smells non-idiomatic. Also, if we end up polishing this feature further, I’m thinking it could live in a dedicated module for cleaner boundaries ?

lib/
 └─ toolcalls/
     ├─ toolcall-service.ts
     ├─ toolcall-store.ts
     ├─ ToolCallBlock.svelte
     └─ ToolCallItem.svelte

Oct 22 '25 17:10 ServeurpersoCom

And we could even imagine the architecture being reusable later : like having a small JavaScript execution module decoupled from the UI, so the model could actually interact with a JS thread it coded itself. That would also cover, in a more generic way, the proposal from PR #13501 by @samolego but in this case, the model would generate and run its own JS tools. Done properly, it’s no more of a security risk than the HTML/JS preview you get in Hugging Face Chat or Claude! Sans titre

Oct 22 '25 17:10 ServeurpersoCom

Includes a very small optimization from the previous PR (scroll listener removal). It landed here intentionally :D

Oct 22 '25 17:10 ServeurpersoCom

Testing :

Add this

{
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "simple_addition_tool",
        "description": "A dummy calculator tool used for testing multi-argument tool call streaming.",
        "parameters": {
          "type": "object",
          "properties": {
            "a": {
              "type": "number",
              "description": "The first number to add."
            },
            "b": {
              "type": "number",
              "description": "The second number to add."
            }
          },
          "required": ["a", "b"]
        }
      }
    }
  ]
}

Here :

And ask model :

Nov 01 '25 19:11 ServeurpersoCom

Rebase / Format / Build

Nov 13 '25 19:11 ServeurpersoCom

@ServeurpersoCom please re-base & rebuild

Nov 14 '25 20:11 allozaur

@ServeurpersoCom please re-base & rebuild

rebased and rebuilt

Nov 14 '25 20:11 ServeurpersoCom

Hello @ServeurpersoCom. Thank you for implementation tool-calling. I'm highly interested in it. I was trying to reproduce your 2+2 calculation example. Unfortunately, the tool calling part wasn't working as shown below.

I wonder how I could attach a script to perform the actual computation. Do I need to use a browser script to detect the presence of the tool calling clipboard copy button and send the tool calling response to an API. Or if there's an API endpoint that my script/program need to listen for tool calling events so that my script can perform the computation?

Nov 17 '25 15:11 SadaleNet

Hello @ServeurpersoCom. Thank you for implementation tool-calling. I'm highly interested in it. I was trying to reproduce your 2+2 calculation example. Unfortunately, the tool calling part wasn't working as shown below.

I wonder how I could attach a script to perform the actual computation. Do I need to use a browser script to detect the presence of the tool calling clipboard copy button and send the tool calling response to an API. Or if there's an API endpoint that my script/program need to listen for tool calling events so that my script can perform the computation?

It's a tools_calls debugger-only, and it works. On your screenshot you get a "simple_addition_tool" tag under the (empty) assistant message. Hover or click to read the function written by your model!

Nov 17 '25 15:11 ServeurpersoCom

https://github.com/user-attachments/assets/ac188e22-9bbf-48a0-8e12-f655ec5a4ecd We’re working hard with Alek on the MCP client. Here’s what it does in dev

Nov 17 '25 15:11 ServeurpersoCom

Hello @ServeurpersoCom. Thank you for implementation tool-calling. I'm highly interested in it. I was trying to reproduce your 2+2 calculation example. Unfortunately, the tool calling part wasn't working as shown below. I wonder how I could attach a script to perform the actual computation. Do I need to use a browser script to detect the presence of the tool calling clipboard copy button and send the tool calling response to an API. Or if there's an API endpoint that my script/program need to listen for tool calling events so that my script can perform the computation?

It's a tools_calls debugger-only, and it works. On your screenshot you get a "simple_addition_tool" tag under the (empty) assistant message. Hover or click to read the function written by your model!

oh. ok. So, at this point, is there any way that I could actually execute the calculation function with the UI provided by llama-server? I've got a CLI program working that can do the calculation work using the OpenAi-compatible API of llama-server. I don't want to reinvent the UI if it already exist.

Nov 17 '25 15:11 SadaleNet

Hello @ServeurpersoCom. Thank you for implementation tool-calling. I'm highly interested in it. I was trying to reproduce your 2+2 calculation example. Unfortunately, the tool calling part wasn't working as shown below. I wonder how I could attach a script to perform the actual computation. Do I need to use a browser script to detect the presence of the tool calling clipboard copy button and send the tool calling response to an API. Or if there's an API endpoint that my script/program need to listen for tool calling events so that my script can perform the computation?

It's a tools_calls debugger-only, and it works. On your screenshot you get a "simple_addition_tool" tag under the (empty) assistant message. Hover or click to read the function written by your model!

oh. ok. So, at this point, is there any way that I could actually execute the calculation function with the UI provided by llama-server? I've got a CLI program working that can do the calculation work using the OpenAi-compatible API of llama-server. I don't want to reinvent the UI if it already exist.

Yes you don’t need to reinvent the wheel. But the UI is still in development, and it’s a heavy piece of work. That’s exactly why MCP exists: the model emits tool calls, you wrap them, and you send them to an MCP server that returns the result into the context. If you’re comfortable with sysadmin work, I can give you what you need.

Nov 17 '25 15:11 ServeurpersoCom

Hello @ServeurpersoCom. Thank you for implementation tool-calling. I'm highly interested in it. I was trying to reproduce your 2+2 calculation example. Unfortunately, the tool calling part wasn't working as shown below. I wonder how I could attach a script to perform the actual computation. Do I need to use a browser script to detect the presence of the tool calling clipboard copy button and send the tool calling response to an API. Or if there's an API endpoint that my script/program need to listen for tool calling events so that my script can perform the computation?

It's a tools_calls debugger-only, and it works. On your screenshot you get a "simple_addition_tool" tag under the (empty) assistant message. Hover or click to read the function written by your model!

oh. ok. So, at this point, is there any way that I could actually execute the calculation function with the UI provided by llama-server? I've got a CLI program working that can do the calculation work using the OpenAi-compatible API of llama-server. I don't want to reinvent the UI if it already exist.

Yes you don’t need to reinvent the wheel. But the UI is still in development, and it’s a heavy piece of work. That’s exactly why MCP exists: the model emits tool calls, you wrap them, and you send them to an MCP server that returns the result into the context. If you’re comfortable with sysadmin work, I can give you what you need.

Oh sure. Tell me more. I just need a starting point. Particularly on how to intercept the model's tool call signal and return the appropriate result to the model while using the UI of llama-server. Just to confirm, would this mechanism work in the current master branch of llama.cpp?

As for MCP, if it's not absolutely required, I guess I can explore that on my own later. :P

Again, thanks a lot for working on this feature. People like me are highly thankful of your work.

EDIT: Oh wait. Did you actually mean that the UI isn't ready and I have to use other methods to get the tool-calling mechanism working for now?

Nov 17 '25 15:11 SadaleNet

EDIT: Oh wait. Did you actually mean that the UI isn't ready and I have to use other methods to get the tool-calling mechanism working for now?

Absolutely

Nov 17 '25 16:11 ServeurpersoCom

webui: add OAI-Compat Harmony tool-call streaming visualization and persistence in chat UI

Testing :

Add this

Here :

And ask model :