OpenAI API Streaming Error: 500 Unknown method llama.cpp with unsloth/Qwen3-Coder-480B-A35B-Instruct-GGUF
Unable to make it run qwen3-coder with llama.cpp server while using qwen-code cli client
Following tutorial here https://docs.unsloth.ai/basics/qwen3-coder#improving-generation-speed
./llama.cpp/build/bin/llama-server --model unsloth/Qwen3-Coder-480B-A35B-Instruct-GGUF/UD-Q2_K_XL/Qwen3-Coder-480B-A35B-Instruct-UD-Q2_K_XL-00001-of-00004.gguf \
--threads -1 --ctx-size 20000 --temp 0.7 --min-p 0.0 --top-p 0.8 --top-k 20 \
--repeat-penalty 1.05 --n-gpu-layers 200 -ot ".ffn_(up|down)_exps.=CPU" --jinja
ℹ Authenticated via "openai". │
│ ✖ OpenAI API Streaming Error: 500 Unknown method: keys at row 57, column 46: │
│ {%- set handled_keys = ['type', 'description', 'enum', 'required'] %} │
│ {%- for json_key in param_fields.keys() | reject("in", handled_keys) %} │
│ ^ │
│ {%- set normed_json_key = json_key | replace("-", "_") | replace(" │
│ ", "_") | replace("$", "") %} │
│ at row 57, column 81: │
│ {%- set handled_keys = ['type', 'description', 'enum', 'required'] %} │
│ {%- for json_key in param_fields.keys() | reject("in", handled_keys) %} │
│ ^ │
│ {%- set normed_json_key = json_key | replace("-", "_") | replace(" │
│ ", "_") | replace("$", "") %} │
│ at row 57, column 13: │
│ {%- set handled_keys = ['type', 'description', 'enum', 'required'] %} │
│ {%- for json_key in param_fields.keys() | reject("in", handled_keys) %} │
│ ^ │
│ {%- set normed_json_key = json_key | replace("-", "_") | replace(" │
│ ", "_") | replace("$", "") %} │
│ at row 46, column 80: │
│ {{- '\n<parameters>' }} │
│ {%- for param_name, param_fields in tool.parameters.properties|items %} │
│ ^ │
│ {{- '\n<parameter>' }} │
│ at row 46, column 9: │
│ {{- '\n<parameters>' }} │
│ {%- for param_name, param_fields in tool.parameters.properties|items %} │
│ ^ │
│ {{- '\n<parameter>' }} │
│ at row 39, column 29: │
│ {{- "<tools>" }} │
│ {%- for tool in tools %} │
│ ^ │
│ {%- if tool.function is defined %} │
│ at row 39, column 5: │
│ {{- "<tools>" }} │
│ {%- for tool in tools %} │
│ ^ │
│ {%- if tool.function is defined %} │
│ at row 36, column 51: │
│ {%- endif %} │
│ {%- if tools is iterable and tools | length > 0 %} │
│ ^ │
│ {{- "\n\nYou have access to the following functions:\n\n" }} │
│ at row 36, column 1: │
│ {%- endif %} │
│ {%- if tools is iterable and tools | length > 0 %} │
│ ^ │
│ {{- "\n\nYou have access to the following functions:\n\n" }} │
│ at row 1, column 1: │
│ {% macro render_item_list(item_list, tag_name='required') %} │
│ ^ │
│ {%- if item_list is defined and item_list is iterable and item_list | length > │
│ 0 %} │
│
I'm having this same issue. I can query Qwen3-Coder running on llama.cpp with simple requests, but if I attempt to use it for anything serious involving code, I get the same error as above.
Maybe the error is raised by the old jinja version which does not support the reject filter (required jinja version >=2.7, see doc).
Try replacing the following statement in the qwen3-coder jinja template
{%- for json_key in param_fields.keys() | reject("in", handled_keys) %}
into
{%- for json_key in param_fields if json_key not in handled_keys %}
The Qwen Coder team is actively checking other potential compatibility errors currently (any runtime problem is welcome), and will update the XML jinja template as well as the corresponding XML tool parser once completed.
Is there a recommended way to run this model on CPU while using qwen-code cli?