qwen3-coder tool call parser
[!NOTE] Original work and PR by bold84 @ https://github.com/ggml-org/llama.cpp/pull/15019
This pull request resolves #15012 and introduces comprehensive support for the Qwen3-Coder model family's XML-based tool-calling format. It includes a new, robust XML parser and updated chat template detection logic to ensure reliable function calling.
Key Changes:
-
New XML Parser (
common/chat-parser.cpp):- A dedicated, non-streaming XML parser has been implemented to handle the Qwen3-Coder's specific output format.
- Features include robust attribute parsing, improved error reporting, and efficient function lookups using a hash set.
-
Chat Template Detection (
common/chat.h,common/chat.cpp):- The chat template detection logic has been updated to correctly identify Qwen3-Coder models, preventing conflicts with other formats like Hermes 2.
- Ensures the
QWEN3_CODER_XMLformat is applied consistently, even when no tools are explicitly provided in the request.
-
Comprehensive tests (
tests/test-chat.cpp):- Comprehensive tests for the parser logic has been implemented.
Known issues:
- The model (Qwen3-Coder-30B-A3B-Instruct-UD-Q*_K_XL.gguf) occasionally stops prefixing tool calls with the proper
<tool_call>. This seems to be an issue with the model itself(?).
Anecdotally, I observed that the previous PR (and presumably this PR too) essentially fixed tool calling for qwen3-coder. Although when trying to use it with codex, qwen3-coder absolutely refuses to use the apply_patch tool, opting to use sed instead, which is probably just a training issue?
It would be nice to get this PR merged in.
Anecdotally, I observed that the previous PR (and presumably this PR too) essentially fixed tool calling for qwen3-coder. Although when trying to use it with
codex,qwen3-coderabsolutely refuses to use theapply_patchtool, opting to usesedinstead, which is probably just a training issue?It would be nice to get this PR merged in.
I guess you could test it through openrouter or something and check if you see the same behavior there as well. My guess would be that it's a model thing and not so much this PR. Or maybe even a codex thing since it's probably heavily optimized for GPT models in terms of system prompt and tool descriptions.
Hey, just to confirm that running this branch fixes the integration with Qwen3-Coder-30B-A3B.
Reproduction steps:
# Compile this branch
mkdir $HOME/bin; cd $HOME/bin
git clone https://github.com/marceldev89/llama.cpp.git llama.cpp-fork-sources && cd llama.cpp-fork-sources
cmake -Bbuild && cmake --build build --target llama-server --parallel
# Install qwen
brew install qwen-coder
# Launch model
$HOME/bin/llama.cpp-fork-sources/build/bin/llama-server --port 8012 --host 0.0.0.0 --jinja -ngl 99 -c 300000 -m $HOME/.lmstudio/models/hf.co/hf.co-unsloth-Qwen3-Coder-30B-A3B-Instruct-GGUF-UD-Q4-K-XL-GGUF/hf.co-unsloth-Qwen3-Coder-30B-A3B-Instruct-GGUF-UD-Q4-K-XL.gguf
# Launch qwen
OPENAI_API_KEY=no OPENAI_BASE_URL=http://localhost:8012/v1 OPENAI_MODEL=models/hf.co-unsloth-Qwen3-Coder-30B-A3B-Instruct-GGUF-UD-Q4-K-XL.gguf qwen
PS: I opened too many tabs to figure it out, and I can’t find the sources any more to properly source them. I invented nothing here, credits goes to whoever wrote the pieces first.
@MartyLake can you try also opencode if it works well? https://github.com/sst/opencode/issues/1890
I've confirmed that this branch also fixes the issue on opencode.ai.
I tested with unsloth/Qwen3-Coder-30B-A3B-Instruct-GGUF:Q4_K_M.
ggml-org:b7018
marceldev89:qwen3-coder_tool_call_parser
(tool-call:todowrite is handled properly)
This feature is really needed, please merge this PR as soon as possible.
@grigio
can you try also opencode if it works well?
can confirm it does work well with opencode too.
Hi, I'm using
TAG="llama.cpp-b6652.amd0_rocm7.0.0_ubuntu24.04"
MODEL="Qwen3-Coder-30B-A3B-Instruct-Q4_K_M.gguf"
docker image "rocm/llama.cpp:${TAG}_server"
and I'm getting incomplete tags from LLM responses like below - would this PR help?
Let me look for any files that might contain the pattern we're looking for, perhaps using a different approach:
<tool_call>
<function=mcphub__list_directory
<parameter=path
/home/sarunas/dev/subdir
</parameter>
</function>
</tool_call>
trying to figure if I should try to find out how to build it myself as I've seen reports how it helped elsewhere :)
sorry for the silly question there above.
this PR does help - thanks!
Seems like another PR (https://github.com/ggml-org/llama.cpp/pull/16932) was merged a few days ago that makes this obsolete. Giving it a quick test with Qwen3 Coder to see if it actually works.
Alright, closing this in favor of the merged https://github.com/ggml-org/llama.cpp/pull/16932. Seems to be working well. I'll keep this branch around just in case. :)