Misc. bug: tool calls are broken
Name and Version
Why would anyone implement the syntax error checking of the escaped json inside the llm response in a way that does not work? What was the point?
see more info: https://github.com/ikawrakow/ik_llama.cpp/issues/750
Operating systems
No response
Which llama.cpp modules do you know to be affected?
No response
Command line
Problem description & steps to reproduce
// TODO
First Bad Commit
No response
Relevant log output
@magikRUKKOLA Yes, I can confirm from my template implementation that this code is buggy as hell. From my personal experience, the only reliable workaround has been to disable the partial tool streaming whatsoever - leave partial streaming, but for tool calls only stream then when the entire tool call has been parsed. I will sit down for a refactor of that buggy mess (together with adding some sufficiently complex test cases to catch most of the culprits) when I'm done with Qwen3Next.
Basically, if you want to do what I did and get a quick working, if somewhat UX-unfriendly solution, you can look at what I did in common_chat_parse_nemotron_v2.
@pwilkin
but for tool calls only stream then when the entire tool call has been parsed.
We both know that this is not a solution. The whole point of streaming be it a regular llm response or the tool call, is to get the output tokens as soon as possible. For example if the user of the llm sees that llm tries to do some stupid shit in the tool call, it would be logical to cancel the llm response rightaway and add some clarifications to the initial prompt. Now we are in the situation when such a simple functionality is not implemented lol. The code needs to be rewrtitten asap. Moreover there supposed to be some tests that would run the certain llm quant with a certain seed to make sure the tool calls functionality is working as intended. Otherwise, the code is not production-ready at all. This is very sad.
Yeah, as I said, I'm aware of this, but the Qwen3 Next conversion is proving to be extremely time consuming, to say the least. We don't really need tests on live models, but more robust tests for streaming would certainly be required.
This issue was closed because it has been inactive for 14 days since being marked as stale.