llama.cpp qwen3-coder tool call parser

[!NOTE] Original work and PR by bold84 @ https://github.com/ggml-org/llama.cpp/pull/15019

This pull request resolves #15012 and introduces comprehensive support for the Qwen3-Coder model family's XML-based tool-calling format. It includes a new, robust XML parser and updated chat template detection logic to ensure reliable function calling.

Key Changes:

New XML Parser (common/chat-parser.cpp):
- A dedicated, non-streaming XML parser has been implemented to handle the Qwen3-Coder's specific output format.
- Features include robust attribute parsing, improved error reporting, and efficient function lookups using a hash set.
Chat Template Detection (common/chat.h, common/chat.cpp):
- The chat template detection logic has been updated to correctly identify Qwen3-Coder models, preventing conflicts with other formats like Hermes 2.
- Ensures the QWEN3_CODER_XML format is applied consistently, even when no tools are explicitly provided in the request.
Comprehensive tests (tests/test-chat.cpp):
- Comprehensive tests for the parser logic has been implemented.

Known issues:

The model (Qwen3-Coder-30B-A3B-Instruct-UD-Q*_K_XL.gguf) occasionally stops prefixing tool calls with the proper <tool_call>. This seems to be an issue with the model itself(?).

Oct 24 '25 10:10 marceldev89

Anecdotally, I observed that the previous PR (and presumably this PR too) essentially fixed tool calling for qwen3-coder. Although when trying to use it with codex, qwen3-coder absolutely refuses to use the apply_patch tool, opting to use sed instead, which is probably just a training issue?

It would be nice to get this PR merged in.

Oct 24 '25 14:10 coder543

Anecdotally, I observed that the previous PR (and presumably this PR too) essentially fixed tool calling for qwen3-coder. Although when trying to use it with codex, qwen3-coder absolutely refuses to use the apply_patch tool, opting to use sed instead, which is probably just a training issue?

It would be nice to get this PR merged in.

I guess you could test it through openrouter or something and check if you see the same behavior there as well. My guess would be that it's a model thing and not so much this PR. Or maybe even a codex thing since it's probably heavily optimized for GPT models in terms of system prompt and tool descriptions.

Oct 24 '25 14:10 marceldev89

Hey, just to confirm that running this branch fixes the integration with Qwen3-Coder-30B-A3B.

Reproduction steps:

# Compile this branch
mkdir $HOME/bin; cd $HOME/bin
git clone https://github.com/marceldev89/llama.cpp.git llama.cpp-fork-sources && cd llama.cpp-fork-sources
cmake -Bbuild && cmake --build build --target llama-server --parallel

# Install qwen
brew install qwen-coder

# Launch model
$HOME/bin/llama.cpp-fork-sources/build/bin/llama-server --port 8012 --host 0.0.0.0 --jinja -ngl 99 -c 300000 -m $HOME/.lmstudio/models/hf.co/hf.co-unsloth-Qwen3-Coder-30B-A3B-Instruct-GGUF-UD-Q4-K-XL-GGUF/hf.co-unsloth-Qwen3-Coder-30B-A3B-Instruct-GGUF-UD-Q4-K-XL.gguf

# Launch qwen
OPENAI_API_KEY=no OPENAI_BASE_URL=http://localhost:8012/v1 OPENAI_MODEL=models/hf.co-unsloth-Qwen3-Coder-30B-A3B-Instruct-GGUF-UD-Q4-K-XL.gguf qwen

PS: I opened too many tabs to figure it out, and I can’t find the sources any more to properly source them. I invented nothing here, credits goes to whoever wrote the pieces first.

Oct 24 '25 20:10 MartyLake

@MartyLake can you try also opencode if it works well? https://github.com/sst/opencode/issues/1890

Nov 06 '25 03:11 grigio

I've confirmed that this branch also fixes the issue on opencode.ai. I tested with unsloth/Qwen3-Coder-30B-A3B-Instruct-GGUF:Q4_K_M.

ggml-org:b7018 Screenshot_20251111_143319

marceldev89:qwen3-coder_tool_call_parser (tool-call:todowrite is handled properly) Screenshot_20251111_141841

Nov 12 '25 02:11 iwauo

This feature is really needed, please merge this PR as soon as possible.

Nov 12 '25 15:11 przutto

@grigio

can you try also opencode if it works well?

can confirm it does work well with opencode too.

Nov 12 '25 20:11 MartyLake

Hi, I'm using

TAG="llama.cpp-b6652.amd0_rocm7.0.0_ubuntu24.04" MODEL="Qwen3-Coder-30B-A3B-Instruct-Q4_K_M.gguf" docker image "rocm/llama.cpp:${TAG}_server"

and I'm getting incomplete tags from LLM responses like below - would this PR help?

Let me look for any files that might contain the pattern we're looking for, perhaps using a different approach:
<tool_call>
<function=mcphub__list_directory
<parameter=path
/home/sarunas/dev/subdir
</parameter>
</function>
</tool_call>

trying to figure if I should try to find out how to build it myself as I've seen reports how it helped elsewhere :)

Nov 18 '25 12:11 svalaskevicius

sorry for the silly question there above.

this PR does help - thanks!

Nov 18 '25 17:11 svalaskevicius

Seems like another PR (https://github.com/ggml-org/llama.cpp/pull/16932) was merged a few days ago that makes this obsolete. Giving it a quick test with Qwen3 Coder to see if it actually works.

Nov 21 '25 13:11 marceldev89

Alright, closing this in favor of the merged https://github.com/ggml-org/llama.cpp/pull/16932. Seems to be working well. I'll keep this branch around just in case. :)

Nov 21 '25 13:11 marceldev89