llama.cpp Eval bug: Granite 4 template detection fails

Name and Version

ilintar@LinuksowaJaskinia:/mnt/win/k/models/unsloth/granite-4.0-h-small-GGUF$ llama-cli --version load_backend: loaded BLAS backend from /devel/tools/llama.cpp/build/bin/libggml-blas.so register_backend: registered backend BLAS (1 devices) register_device: registered device BLAS (OpenBLAS) ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 2 CUDA devices: Device 0: NVIDIA GeForce RTX 3080, compute capability 8.6, VMM: yes Device 1: NVIDIA GeForce RTX 5060 Ti, compute capability 12.0, VMM: yes load_backend: loaded CUDA backend from /devel/tools/llama.cpp/build/bin/libggml-cuda.so register_backend: registered backend CUDA (2 devices) register_device: registered device CUDA0 (NVIDIA GeForce RTX 3080) register_device: registered device CUDA1 (NVIDIA GeForce RTX 5060 Ti) ggml_backend_load_best: /devel/tools/llama.cpp/build/bin/libggml-cpu-icelake.so score: 0 ggml_backend_load_best: /devel/tools/llama.cpp/build/bin/libggml-cpu-skylakex.so score: 0 ggml_backend_load_best: /devel/tools/llama.cpp/build/bin/libggml-cpu-haswell.so score: 64 ggml_backend_load_best: /devel/tools/llama.cpp/build/bin/libggml-cpu-sse42.so score: 5 ggml_backend_load_best: /devel/tools/llama.cpp/build/bin/libggml-cpu-sandybridge.so score: 21 ggml_backend_load_best: /devel/tools/llama.cpp/build/bin/libggml-cpu-alderlake.so score: 0 ggml_backend_load_best: /devel/tools/llama.cpp/build/bin/libggml-cpu-x64.so score: 1 ggml_backend_load_best: /devel/tools/llama.cpp/build/bin/libggml-cpu-sapphirerapids.so score: 0 load_backend: loaded CPU backend from /devel/tools/llama.cpp/build/bin/libggml-cpu-haswell.so register_backend: registered backend CPU (1 devices) register_device: registered device CPU (Intel(R) Core(TM) i7-9700K CPU @ 3.60GHz) version: 6539 (7ec2df64a) built with cc (Ubuntu 14.2.0-19ubuntu2) 14.2.0 for x86_64-linux-gnu

Operating systems

Linux

GGML backends

CPU

Hardware

i7-9700K + GTX 3080 + GTX 5060 Ti

Models

granite-4.0-h-small-GGUF-Q8_0

Problem description & steps to reproduce

The Granite 4 Hybrid model doesn't seem to recognize its own chat template, leading to a crash later on when it tries to use tools. The template detection string in llama-chat.cpp is too specific, seems to only match one of the old templates for the Tiny model.

First Bad Commit

No response

Relevant log output

Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
__syscall_cancel_arch () at ../sysdeps/unix/sysv/linux/x86_64/syscall_cancel.S:56
warning: 56	../sysdeps/unix/sysv/linux/x86_64/syscall_cancel.S: No such file or directory
#0  __syscall_cancel_arch () at ../sysdeps/unix/sysv/linux/x86_64/syscall_cancel.S:56
56	in ../sysdeps/unix/sysv/linux/x86_64/syscall_cancel.S
#1  0x00007b8da2e9eb63 in __internal_syscall_cancel (a1=<optimized out>, a2=<optimized out>, a3=<optimized out>, a4=<optimized out>, a5=0, a6=0, nr=61) at ./nptl/cancellation.c:49
warning: 49	./nptl/cancellation.c: No such file or directory
#2  __syscall_cancel (a1=<optimized out>, a2=<optimized out>, a3=<optimized out>, a4=<optimized out>, a5=a5@entry=0, a6=a6@entry=0, nr=61) at ./nptl/cancellation.c:75
75	in ./nptl/cancellation.c
#3  0x00007b8da2f1ae9f in __GI___wait4 (pid=<optimized out>, stat_loc=<optimized out>, options=<optimized out>, usage=<optimized out>) at ../sysdeps/unix/sysv/linux/wait4.c:30
warning: 30	../sysdeps/unix/sysv/linux/wait4.c: No such file or directory
#4  0x00007b8da3546b71 in ggml_print_backtrace () at /devel/tools/llama.cpp/ggml/src/ggml.c:196
196	        waitpid(child_pid, NULL, 0);
#5  0x00007b8da355c393 in ggml_uncaught_exception () at /devel/tools/llama.cpp/ggml/src/ggml.cpp:9
9	    ggml_print_backtrace();
#6  0x00007b8da32c10aa in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
#7  0x00007b8da32aaa9e in std::terminate() () from /lib/x86_64-linux-gnu/libstdc++.so.6
#8  0x00007b8da32c1361 in __cxa_throw () from /lib/x86_64-linux-gnu/libstdc++.so.6
#9  0x00005886af1bde71 in nlohmann::json_abi_v3_12_0::detail::json_sax_dom_parser<nlohmann::json_abi_v3_12_0::basic_json<nlohmann::json_abi_v3_12_0::ordered_map, std::vector, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool, long, unsigned long, double, std::allocator, nlohmann::json_abi_v3_12_0::adl_serializer, std::vector<unsigned char, std::allocator<unsigned char> >, void>, nlohmann::json_abi_v3_12_0::detail::iterator_input_adapter<__gnu_cxx::__normal_iterator<char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >::parse_error<nlohmann::json_abi_v3_12_0::detail::parse_error> (this=0x7ffd12a16260, ex=...) at /devel/tools/llama.cpp/common/../vendor/nlohmann/json.hpp:8983
8983	            JSON_THROW(ex);
#10 0x00005886af1bd899 in nlohmann::json_abi_v3_12_0::detail::parser<nlohmann::json_abi_v3_12_0::basic_json<nlohmann::json_abi_v3_12_0::ordered_map, std::vector, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool, long, unsigned long, double, std::allocator, nlohmann::json_abi_v3_12_0::adl_serializer, std::vector<unsigned char, std::allocator<unsigned char> >, void>, nlohmann::json_abi_v3_12_0::detail::iterator_input_adapter<__gnu_cxx::__normal_iterator<char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >::sax_parse_internal<nlohmann::json_abi_v3_12_0::detail::json_sax_dom_parser<nlohmann::json_abi_v3_12_0::basic_json<nlohmann::json_abi_v3_12_0::ordered_map, std::vector, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool, long, unsigned long, double, std::allocator, nlohmann::json_abi_v3_12_0::adl_serializer, std::vector<unsigned char, std::allocator<unsigned char> >, void>, nlohmann::json_abi_v3_12_0::detail::iterator_input_adapter<__gnu_cxx::__normal_iterator<char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > > > (this=0x7ffd12a16460, sax=0x7ffd12a16260) at /devel/tools/llama.cpp/common/../vendor/nlohmann/json.hpp:13324
13324	            return sax->parse_error(m_lexer.get_position(),
#11 0x00005886af19a2c9 in nlohmann::json_abi_v3_12_0::detail::parser<nlohmann::json_abi_v3_12_0::basic_json<nlohmann::json_abi_v3_12_0::ordered_map, std::vector, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool, long, unsigned long, double, std::allocator, nlohmann::json_abi_v3_12_0::adl_serializer, std::vector<unsigned char, std::allocator<unsigned char> >, void>, nlohmann::json_abi_v3_12_0::detail::iterator_input_adapter<__gnu_cxx::__normal_iterator<char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >::parse (this=0x7ffd12a16460, strict=true, result=...) at /devel/tools/llama.cpp/common/../vendor/nlohmann/json.hpp:12984
12984	            sax_parse_internal(&sdp);
#12 0x00005886af277ded in nlohmann::json_abi_v3_12_0::basic_json<nlohmann::json_abi_v3_12_0::ordered_map, std::vector, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool, long, unsigned long, double, std::allocator, nlohmann::json_abi_v3_12_0::adl_serializer, std::vector<unsigned char, std::allocator<unsigned char> >, void>::parse<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&> (i="\n{\"name\": \"create_text_file\", \"arguments\": {\"relative_path\":\"/devel/alt/random/isorpg/src/main.js\",\"content\":\"// main.js \\u2013 entry point for the Three.js Isometric RPG\\n\\n// Import core Three.js mo"..., cb=..., allow_exceptions=true, ignore_comments=false) at /devel/tools/llama.cpp/common/../vendor/nlohmann/json.hpp:24063
24063	        parser(detail::input_adapter(std::forward<InputType>(i)), std::move(cb), allow_exceptions, ignore_comments).parse(true, result); // cppcheck-suppress[accessMoved,accessForwarded]
#13 0x00005886af3d7be6 in common_json_parse (it=10 '\n', end=0 '\000', healing_marker="1910210050", out=...) at /devel/tools/llama.cpp/common/json-partial.cpp:245
245	            out.json = json::parse(str);
#14 0x00005886af3d19ff in common_chat_msg_parser::try_consume_json (this=0x7ffd12a16e80) at /devel/tools/llama.cpp/common/chat-parser.cpp:237
237	    if (!common_json_parse(it, end, healing_marker_, result)) {
#15 0x00005886af3d2ce3 in common_chat_msg_parser::try_consume_json_with_dumped_args (this=0x7ffd12a16e80, args_paths=std::vector of length 1, capacity 1 = {...}, content_paths=std::vector of length 0, capacity 0) at /devel/tools/llama.cpp/common/chat-parser.cpp:272
272	    auto partial = try_consume_json();
#16 0x00005886af2d8674 in common_chat_parse_hermes_2_pro (builder=...) at /devel/tools/llama.cpp/common/chat.cpp:2109
2109	            if (auto tool_call = builder.try_consume_json_with_dumped_args({{"arguments"}})) {
#17 0x00005886af2e05fc in common_chat_parse (builder=...) at /devel/tools/llama.cpp/common/chat.cpp:2686
2686	            common_chat_parse_hermes_2_pro(builder);
#18 0x00005886af2e083b in common_chat_parse (input="<tool_call>\n{\"name\": \"create_text_file\", \"arguments\": {\"relative_path\":\"/devel/alt/random/isorpg/src/main.js\",\"content\":\"// main.js \\u2013 entry point for the Three.js Isometric RPG\\n\\n// Import core "..., is_partial=true, syntax=...) at /devel/tools/llama.cpp/common/chat.cpp:2715
2715	        common_chat_parse(builder);
#19 0x00005886af13567c in server_slot::update_chat_msg (this=0x5886d39e9220, diffs=std::vector of length 0, capacity 0) at /devel/tools/llama.cpp/tools/server/server.cpp:1620
1620	            params.oaicompat_chat_syntax);
#20 0x00005886af142443 in server_context::send_partial_response (this=0x7ffd12a1a230, slot=..., tkn=..., is_progress=false) at /devel/tools/llama.cpp/tools/server/server.cpp:2776
2776	            slot.update_chat_msg(res->oaicompat_msg_diffs);
#21 0x00005886af140ff6 in server_context::process_token (this=0x7ffd12a1a230, result=..., slot=...) at /devel/tools/llama.cpp/tools/server/server.cpp:2568
2568	                send_partial_response(slot, result, false);
#22 0x00005886af1499e1 in server_context::update_slots (this=0x7ffd12a1a230) at /devel/tools/llama.cpp/tools/server/server.cpp:3922
3922	                if (!process_token(result, slot)) {
#23 0x00005886af0e5b13 in operator() (__closure=0x7ffd12a1b9a0) at /devel/tools/llama.cpp/tools/server/server.cpp:5384
5384	        ctx_server.update_slots();
#24 0x00005886af0f492a in std::__invoke_impl<void, main(int, char**)::<lambda()>&>(std::__invoke_other, struct {...} &) (__f=...) at /usr/include/c++/14/bits/invoke.h:61
61	    { return std::forward<_Fn>(__f)(std::forward<_Args>(__args)...); }
#25 0x00005886af0f26f0 in std::__invoke_r<void, main(int, char**)::<lambda()>&>(struct {...} &) (__fn=...) at /usr/include/c++/14/bits/invoke.h:111
111	        std::__invoke_impl<__type>(__tag{}, std::forward<_Callable>(__fn),
#26 0x00005886af0ee373 in std::_Function_handler<void(), main(int, char**)::<lambda()> >::_M_invoke(const std::_Any_data &) (__functor=...) at /usr/include/c++/14/bits/std_function.h:290
290	        return std::__invoke_r<_Res>(*_Base::_M_get_pointer(__functor),
#27 0x00005886af14fdc4 in std::function<void()>::operator() (this=0x7ffd12a1b9a0) at /usr/include/c++/14/bits/std_function.h:591
591	        return _M_invoker(_M_functor, std::forward<_ArgTypes>(__args)...);
#28 0x00005886af138f8d in server_queue::start_loop (this=0x7ffd12a1b880) at /devel/tools/llama.cpp/tools/server/server.cpp:1918
1918	            callback_update_slots();
#29 0x00005886af0e835e in main (argc=10, argv=0x7ffd12a1bc88) at /devel/tools/llama.cpp/tools/server/server.cpp:5411
5411	    ctx_server.queue_tasks.start_loop();

Oct 03 '25 19:10 pwilkin

My tool-calling works fine on MoE-Granite-4.0-h-small-32B, but since I currently use a custom proxy that handles the parsing myself, I’d much rather understand and leverage the native tool-calling logic in llama.cpp. That way I can drop my SSE proxy entirely and help fix or improve the community’s parsing code instead of maintaining my own. I’ll take some time to read through the code and understand how it works internally.

Oct 12 '25 08:10 ServeurpersoCom

The current Granite codepath in llama.cpp is incomplete: template detection fails and falls back to the Hermes 2 Pro parser. When this happens, the runtime applies Hermes-style parsing rules to Granite prompts, which expect <tool_call> objects instead of Granite’s <|tool_call|> JSON array. As a result, any partial generation containing <|tool_call|>[{…}] triggers a nlohmann::json parse error and crashes the stream before server_slot::update_chat_msg() can assign tool-call IDs.

Oct 12 '25 08:10 ServeurpersoCom

My proxy is just an intermediate OpenAI-compatible client : it doesn’t change or reinterpret tool-calling logic, it simply forwards requests and SSE chunks. I’m keeping it for now, but I’m trying to understand why tool calls work perfectly with Granite on my setup, while the same model sometimes fails under llama-server. It might be related to SSE mode working fine while non-streaming requests could trigger the fallback... I’ll test that, since detection should follow the same path.

Oct 12 '25 08:10 ServeurpersoCom

Special characters (including Unicode and emoji) pass correctly in tool calls. But it don't like the \u2013 !

Oct 12 '25 08:10 ServeurpersoCom

The current Granite codepath in llama.cpp is incomplete: template detection fails and falls back to the Hermes 2 Pro parser. When this happens, the runtime applies Hermes-style parsing rules to Granite prompts, which expect <tool_call> objects instead of Granite’s <|tool_call|> JSON array. As a result, any partial generation containing <|tool_call|>[{…}] triggers a nlohmann::json parse error and crashes the stream before server_slot::update_chat_msg() can assign tool-call IDs.

Yep. Just need to fix the template detection code to use stuff specific to the new Granite templates as well (would do it, but busy with you-know-what).

Oct 12 '25 13:10 pwilkin

Turns out the entire Granite 4 detection issue was just missing this tiny condition block ! unbelievable. I’ll test it thoroughly and add it to the validation suite!

(root|~/llama.cpp.pascal) git diff
diff --git a/common/chat.cpp b/common/chat.cpp
index 8587140e1..37f091ec3 100644
--- a/common/chat.cpp
+++ b/common/chat.cpp
@@ -2719,7 +2719,10 @@ static common_chat_params common_chat_templates_apply_jinja(
     }

     // Granite (IBM) - detects thinking / tools support
-    if (src.find("elif thinking") != std::string::npos && src.find("<|tool_call|>") != std::string::npos) {
+    if (src.find("<|tool_call|>") != std::string::npos &&
+        (src.find("elif thinking") != std::string::npos ||
+         src.find("tools_system_message_prefix") != std::string::npos ||
+         src.find("You may call one or more tools to assist with the user query.") != std::string::npos)) {
         return common_chat_params_init_granite(tmpl, params);
     }

diff --git a/tools/server/public/index.html.gz b/tools/server/public/index.html.gz
index d0c44534e..bb2f3897a 100644
Binary files a/tools/server/public/index.html.gz and b/tools/server/public/index.html.gz differ

I realize now that my setup still goes through llama-server, but my tool calls were simple enough that the fallback Hermes parser never hit malformed JSON. Granite produced clean, single-block tool calls without Unicode escapes or partial JSON chunks, so llama-server didn’t crash even though it misdetected the template. The issue only surfaces when Granite streams <|tool_call|>[{...}] with escaped Unicode or fragmented JSON : that’s when the wrong parser blows up. But now that #16526 has been merged, the fix ensures Granite detection works properly and everything stays clean.

Oct 12 '25 14:10 ServeurpersoCom

https://huggingface.co/ibm-granite/granite-4.0-h-small?chat_template=default

Granite actually uses plain XML-style tags (<tool_call>...</tool_call>) instead of the Llama-style <|tool_call|>. If we don’t properly distinguish the two formats, the Granite condition will end up matching Hermes as well effectively “absorbing” Hermes detection because of its position earlier in the code.

Oct 12 '25 15:10 ServeurpersoCom

Chat format detection is done based on keywords in the template though, not on the matching fragments of the actual chat :)

Oct 12 '25 15:10 pwilkin

Yeah, I'm aware the chat format detection is keyword-based in template, not driven by the message fragments 🙂 What I'm trying to find now is a reproducible test that clearly shows the behavioral difference between a llama-server falling back to Hermes and a patched one correctly detecting Granite. I suspect a curl test with two parallel tool calls in a single completion should expose it : Granite handles multi-tool output properly, while Hermes tends to flatten or merge them.

Oct 12 '25 15:10 ServeurpersoCom

Server logs chat format for messages by default, if I remember correctly. Unless you really need to detect it client-side.

Oct 12 '25 15:10 pwilkin

No patch :

srv  params_from_: Chat format: Hermes 2 Pro

curl https://www.serveurperso.com/ia/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Accept: text/event-stream" \
  -d '{
    "model": "MoE-Granite-4.0-h-small-32B",
    "stream": true,
    "parallel_tool_calls": true,
    "messages": [
      {
        "role": "system",
        "content": "Follow the Granite tool-call format exactly, including the <|tool_call|> JSON list when you invoke more than one tool."
      },
      {
        "role": "user",
        "content": "Call add with x=1,y=2 and multiply with a=3,b=4; respond only with the tool list."
      }
    ],
    "tools": [
      {
        "type": "function",
        "function": {
          "name": "add",
          "description": "Add two integers",
          "parameters": {
            "type": "object",
            "properties": {
              "x": { "type": "integer" },
              "y": { "type": "integer" }
            },
            "required": ["x", "y"]
          }
        }
      },
      {
        "type": "function",
        "function": {
          "name": "multiply",
          "description": "Multiply two integers",
          "parameters": {
            "type": "object",
            "properties": {
              "a": { "type": "integer" },
              "b": { "type": "integer" }
            },
            "required": ["a", "b"]
          }
        }
      }
    ],
    "tool_choice": "auto"
  }'

data: {"choices":[{"finish_reason":null,"index":0,"delta":{"role":"assistant","content":null}}],"created":1760283726,"id":"chatcmpl-Z5GPRuP3Dajix4mkcVYXjuZuHflOjZUv","model":"MoE-Granite-4.0-h-small-32B","system_fingerprint":"b6761-464e1dce2","object":"chat.completion.chunk"}

data: {"choices":[{"finish_reason":null,"index":0,"delta":{"tool_calls":[{"index":0,"id":"SAh7r2xqDVkPbRkoifqjOiHU2O8pIDpt","type":"function","function":{"name":"add","arguments":""}}]}}],"created":1760283726,"id":"chatcmpl-Z5GPRuP3Dajix4mkcVYXjuZuHflOjZUv","model":"MoE-Granite-4.0-h-small-32B","system_fingerprint":"b6761-464e1dce2","object":"chat.completion.chunk"}

data: {"choices":[{"finish_reason":null,"index":0,"delta":{"tool_calls":[{"index":0,"function":{"arguments":"{\""}}]}}],"created":1760283726,"id":"chatcmpl-Z5GPRuP3Dajix4mkcVYXjuZuHflOjZUv","model":"MoE-Granite-4.0-h-small-32B","system_fingerprint":"b6761-464e1dce2","object":"chat.completion.chunk"}

data: {"choices":[{"finish_reason":null,"index":0,"delta":{"tool_calls":[{"index":0,"function":{"arguments":"x"}}]}}],"created":1760283726,"id":"chatcmpl-Z5GPRuP3Dajix4mkcVYXjuZuHflOjZUv","model":"MoE-Granite-4.0-h-small-32B","system_fingerprint":"b6761-464e1dce2","object":"chat.completion.chunk"}

data: {"choices":[{"finish_reason":null,"index":0,"delta":{"tool_calls":[{"index":0,"function":{"arguments":"\":"}}]}}],"created":1760283726,"id":"chatcmpl-Z5GPRuP3Dajix4mkcVYXjuZuHflOjZUv","model":"MoE-Granite-4.0-h-small-32B","system_fingerprint":"b6761-464e1dce2","object":"chat.completion.chunk"}

data: {"choices":[{"finish_reason":null,"index":0,"delta":{"tool_calls":[{"index":0,"function":{"arguments":"1,"}}]}}],"created":1760283726,"id":"chatcmpl-Z5GPRuP3Dajix4mkcVYXjuZuHflOjZUv","model":"MoE-Granite-4.0-h-small-32B","system_fingerprint":"b6761-464e1dce2","object":"chat.completion.chunk"}

data: {"choices":[{"finish_reason":null,"index":0,"delta":{"tool_calls":[{"index":0,"function":{"arguments":"\""}}]}}],"created":1760283726,"id":"chatcmpl-Z5GPRuP3Dajix4mkcVYXjuZuHflOjZUv","model":"MoE-Granite-4.0-h-small-32B","system_fingerprint":"b6761-464e1dce2","object":"chat.completion.chunk"}

data: {"choices":[{"finish_reason":null,"index":0,"delta":{"tool_calls":[{"index":0,"function":{"arguments":"y"}}]}}],"created":1760283726,"id":"chatcmpl-Z5GPRuP3Dajix4mkcVYXjuZuHflOjZUv","model":"MoE-Granite-4.0-h-small-32B","system_fingerprint":"b6761-464e1dce2","object":"chat.completion.chunk"}

data: {"choices":[{"finish_reason":null,"index":0,"delta":{"tool_calls":[{"index":0,"function":{"arguments":"\":"}}]}}],"created":1760283726,"id":"chatcmpl-Z5GPRuP3Dajix4mkcVYXjuZuHflOjZUv","model":"MoE-Granite-4.0-h-small-32B","system_fingerprint":"b6761-464e1dce2","object":"chat.completion.chunk"}

data: {"choices":[{"finish_reason":null,"index":0,"delta":{"tool_calls":[{"index":0,"function":{"arguments":"2}"}}]}}],"created":1760283726,"id":"chatcmpl-Z5GPRuP3Dajix4mkcVYXjuZuHflOjZUv","model":"MoE-Granite-4.0-h-small-32B","system_fingerprint":"b6761-464e1dce2","object":"chat.completion.chunk"}

data: {"choices":[{"finish_reason":null,"index":0,"delta":{"tool_calls":[{"index":1,"id":"I9M7sgscQcnWIH4wIG8LEp2SjhkC9lgk","type":"function","function":{"name":"multiply","arguments":""}}]}}],"created":1760283727,"id":"chatcmpl-Z5GPRuP3Dajix4mkcVYXjuZuHflOjZUv","model":"MoE-Granite-4.0-h-small-32B","system_fingerprint":"b6761-464e1dce2","object":"chat.completion.chunk"}

data: {"choices":[{"finish_reason":null,"index":0,"delta":{"tool_calls":[{"index":1,"function":{"arguments":"{\""}}]}}],"created":1760283727,"id":"chatcmpl-Z5GPRuP3Dajix4mkcVYXjuZuHflOjZUv","model":"MoE-Granite-4.0-h-small-32B","system_fingerprint":"b6761-464e1dce2","object":"chat.completion.chunk"}

data: {"choices":[{"finish_reason":null,"index":0,"delta":{"tool_calls":[{"index":1,"function":{"arguments":"a"}}]}}],"created":1760283727,"id":"chatcmpl-Z5GPRuP3Dajix4mkcVYXjuZuHflOjZUv","model":"MoE-Granite-4.0-h-small-32B","system_fingerprint":"b6761-464e1dce2","object":"chat.completion.chunk"}

data: {"choices":[{"finish_reason":null,"index":0,"delta":{"tool_calls":[{"index":1,"function":{"arguments":"\":"}}]}}],"created":1760283727,"id":"chatcmpl-Z5GPRuP3Dajix4mkcVYXjuZuHflOjZUv","model":"MoE-Granite-4.0-h-small-32B","system_fingerprint":"b6761-464e1dce2","object":"chat.completion.chunk"}

data: {"choices":[{"finish_reason":null,"index":0,"delta":{"tool_calls":[{"index":1,"function":{"arguments":"3,"}}]}}],"created":1760283727,"id":"chatcmpl-Z5GPRuP3Dajix4mkcVYXjuZuHflOjZUv","model":"MoE-Granite-4.0-h-small-32B","system_fingerprint":"b6761-464e1dce2","object":"chat.completion.chunk"}

data: {"choices":[{"finish_reason":null,"index":0,"delta":{"tool_calls":[{"index":1,"function":{"arguments":"\""}}]}}],"created":1760283727,"id":"chatcmpl-Z5GPRuP3Dajix4mkcVYXjuZuHflOjZUv","model":"MoE-Granite-4.0-h-small-32B","system_fingerprint":"b6761-464e1dce2","object":"chat.completion.chunk"}

data: {"choices":[{"finish_reason":null,"index":0,"delta":{"tool_calls":[{"index":1,"function":{"arguments":"b"}}]}}],"created":1760283727,"id":"chatcmpl-Z5GPRuP3Dajix4mkcVYXjuZuHflOjZUv","model":"MoE-Granite-4.0-h-small-32B","system_fingerprint":"b6761-464e1dce2","object":"chat.completion.chunk"}

data: {"choices":[{"finish_reason":null,"index":0,"delta":{"tool_calls":[{"index":1,"function":{"arguments":"\":"}}]}}],"created":1760283727,"id":"chatcmpl-Z5GPRuP3Dajix4mkcVYXjuZuHflOjZUv","model":"MoE-Granite-4.0-h-small-32B","system_fingerprint":"b6761-464e1dce2","object":"chat.completion.chunk"}

data: {"choices":[{"finish_reason":null,"index":0,"delta":{"tool_calls":[{"index":1,"function":{"arguments":"4}"}}]}}],"created":1760283727,"id":"chatcmpl-Z5GPRuP3Dajix4mkcVYXjuZuHflOjZUv","model":"MoE-Granite-4.0-h-small-32B","system_fingerprint":"b6761-464e1dce2","object":"chat.completion.chunk"}

data: {"choices":[{"finish_reason":"tool_calls","index":0,"delta":{}}],"created":1760283727,"id":"chatcmpl-Z5GPRuP3Dajix4mkcVYXjuZuHflOjZUv","model":"MoE-Granite-4.0-h-small-32B","system_fingerprint":"b6761-464e1dce2","object":"chat.completion.chunk","timings":{"cache_n":241,"prompt_n":64,"prompt_ms":67.228,"prompt_per_token_ms":1.0504375,"prompt_per_second":951.9842922591778,"predicted_n":50,"predicted_ms":557.635,"predicted_per_token_ms":11.1527,"predicted_per_second":89.66438620244426}}

data: [DONE]

With (new experimental) patch :

srv  params_from_: Chat format: Granite

curl https://www.serveurperso.com/ia/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Accept: text/event-stream" \
  -d '{
    "model": "MoE-Granite-4.0-h-small-32B",
    "stream": true,
    "parallel_tool_calls": true,
    "messages": [
      {
        "role": "system",
        "content": "Follow the Granite tool-call format exactly, including the <|tool_call|> JSON list when you invoke more than one tool."
      },
      {
        "role": "user",
        "content": "Call add with x=1,y=2 and multiply with a=3,b=4; respond only with the tool list."
      }
    ],
    "tools": [
      {
        "type": "function",
        "function": {
          "name": "add",
          "description": "Add two integers",
          "parameters": {
            "type": "object",
            "properties": {
              "x": { "type": "integer" },
              "y": { "type": "integer" }
            },
            "required": ["x", "y"]
          }
        }
      },
      {
        "type": "function",
        "function": {
          "name": "multiply",
          "description": "Multiply two integers",
          "parameters": {
            "type": "object",
            "properties": {
              "a": { "type": "integer" },
              "b": { "type": "integer" }
            },
            "required": ["a", "b"]
          }
        }
      }
    ],
    "tool_choice": "auto"
  }'

data: {"choices":[{"finish_reason":null,"index":0,"delta":{"role":"assistant","content":null}}],"created":1760284180,"id":"chatcmpl-z4nmrl2Fl7EtbFAC0mXVuI4Ol1OnksvQ","model":"MoE-Granite-4.0-h-small-32B","system_fingerprint":"b6761-464e1dce2","object":"chat.completion.chunk"}

data: {"choices":[{"finish_reason":null,"index":0,"delta":{"tool_calls":[{"index":0,"id":"D9q4DM8rZFthCzDN4CRwgxzuRDu9S4Z4","type":"function","function":{"name":"add","arguments":""}}]}}],"created":1760284180,"id":"chatcmpl-z4nmrl2Fl7EtbFAC0mXVuI4Ol1OnksvQ","model":"MoE-Granite-4.0-h-small-32B","system_fingerprint":"b6761-464e1dce2","object":"chat.completion.chunk"}

data: {"choices":[{"finish_reason":null,"index":0,"delta":{"tool_calls":[{"index":0,"function":{"arguments":"{\""}}]}}],"created":1760284180,"id":"chatcmpl-z4nmrl2Fl7EtbFAC0mXVuI4Ol1OnksvQ","model":"MoE-Granite-4.0-h-small-32B","system_fingerprint":"b6761-464e1dce2","object":"chat.completion.chunk"}

data: {"choices":[{"finish_reason":null,"index":0,"delta":{"tool_calls":[{"index":0,"function":{"arguments":"x"}}]}}],"created":1760284181,"id":"chatcmpl-z4nmrl2Fl7EtbFAC0mXVuI4Ol1OnksvQ","model":"MoE-Granite-4.0-h-small-32B","system_fingerprint":"b6761-464e1dce2","object":"chat.completion.chunk"}

data: {"choices":[{"finish_reason":null,"index":0,"delta":{"tool_calls":[{"index":0,"function":{"arguments":"\":"}}]}}],"created":1760284181,"id":"chatcmpl-z4nmrl2Fl7EtbFAC0mXVuI4Ol1OnksvQ","model":"MoE-Granite-4.0-h-small-32B","system_fingerprint":"b6761-464e1dce2","object":"chat.completion.chunk"}

data: {"choices":[{"finish_reason":null,"index":0,"delta":{"tool_calls":[{"index":0,"function":{"arguments":"1,"}}]}}],"created":1760284181,"id":"chatcmpl-z4nmrl2Fl7EtbFAC0mXVuI4Ol1OnksvQ","model":"MoE-Granite-4.0-h-small-32B","system_fingerprint":"b6761-464e1dce2","object":"chat.completion.chunk"}

data: {"choices":[{"finish_reason":null,"index":0,"delta":{"tool_calls":[{"index":0,"function":{"arguments":"\""}}]}}],"created":1760284181,"id":"chatcmpl-z4nmrl2Fl7EtbFAC0mXVuI4Ol1OnksvQ","model":"MoE-Granite-4.0-h-small-32B","system_fingerprint":"b6761-464e1dce2","object":"chat.completion.chunk"}

data: {"choices":[{"finish_reason":null,"index":0,"delta":{"tool_calls":[{"index":0,"function":{"arguments":"y"}}]}}],"created":1760284181,"id":"chatcmpl-z4nmrl2Fl7EtbFAC0mXVuI4Ol1OnksvQ","model":"MoE-Granite-4.0-h-small-32B","system_fingerprint":"b6761-464e1dce2","object":"chat.completion.chunk"}

data: {"choices":[{"finish_reason":null,"index":0,"delta":{"tool_calls":[{"index":0,"function":{"arguments":"\":"}}]}}],"created":1760284181,"id":"chatcmpl-z4nmrl2Fl7EtbFAC0mXVuI4Ol1OnksvQ","model":"MoE-Granite-4.0-h-small-32B","system_fingerprint":"b6761-464e1dce2","object":"chat.completion.chunk"}

data: {"choices":[{"finish_reason":null,"index":0,"delta":{"tool_calls":[{"index":0,"function":{"arguments":"2}"}}]}}],"created":1760284181,"id":"chatcmpl-z4nmrl2Fl7EtbFAC0mXVuI4Ol1OnksvQ","model":"MoE-Granite-4.0-h-small-32B","system_fingerprint":"b6761-464e1dce2","object":"chat.completion.chunk"}

data: {"choices":[{"finish_reason":null,"index":0,"delta":{"tool_calls":[{"index":1,"id":"09co4vbs8aMyaAAHXPsBYCW1uZyepv0A","type":"function","function":{"name":"multiply","arguments":""}}]}}],"created":1760284181,"id":"chatcmpl-z4nmrl2Fl7EtbFAC0mXVuI4Ol1OnksvQ","model":"MoE-Granite-4.0-h-small-32B","system_fingerprint":"b6761-464e1dce2","object":"chat.completion.chunk"}

data: {"choices":[{"finish_reason":null,"index":0,"delta":{"tool_calls":[{"index":1,"function":{"arguments":"{\""}}]}}],"created":1760284181,"id":"chatcmpl-z4nmrl2Fl7EtbFAC0mXVuI4Ol1OnksvQ","model":"MoE-Granite-4.0-h-small-32B","system_fingerprint":"b6761-464e1dce2","object":"chat.completion.chunk"}

data: {"choices":[{"finish_reason":null,"index":0,"delta":{"tool_calls":[{"index":1,"function":{"arguments":"a"}}]}}],"created":1760284181,"id":"chatcmpl-z4nmrl2Fl7EtbFAC0mXVuI4Ol1OnksvQ","model":"MoE-Granite-4.0-h-small-32B","system_fingerprint":"b6761-464e1dce2","object":"chat.completion.chunk"}

data: {"choices":[{"finish_reason":null,"index":0,"delta":{"tool_calls":[{"index":1,"function":{"arguments":"\":"}}]}}],"created":1760284181,"id":"chatcmpl-z4nmrl2Fl7EtbFAC0mXVuI4Ol1OnksvQ","model":"MoE-Granite-4.0-h-small-32B","system_fingerprint":"b6761-464e1dce2","object":"chat.completion.chunk"}

data: {"choices":[{"finish_reason":null,"index":0,"delta":{"tool_calls":[{"index":1,"function":{"arguments":"3,"}}]}}],"created":1760284181,"id":"chatcmpl-z4nmrl2Fl7EtbFAC0mXVuI4Ol1OnksvQ","model":"MoE-Granite-4.0-h-small-32B","system_fingerprint":"b6761-464e1dce2","object":"chat.completion.chunk"}

data: {"choices":[{"finish_reason":null,"index":0,"delta":{"tool_calls":[{"index":1,"function":{"arguments":"\""}}]}}],"created":1760284181,"id":"chatcmpl-z4nmrl2Fl7EtbFAC0mXVuI4Ol1OnksvQ","model":"MoE-Granite-4.0-h-small-32B","system_fingerprint":"b6761-464e1dce2","object":"chat.completion.chunk"}

data: {"choices":[{"finish_reason":null,"index":0,"delta":{"tool_calls":[{"index":1,"function":{"arguments":"b"}}]}}],"created":1760284181,"id":"chatcmpl-z4nmrl2Fl7EtbFAC0mXVuI4Ol1OnksvQ","model":"MoE-Granite-4.0-h-small-32B","system_fingerprint":"b6761-464e1dce2","object":"chat.completion.chunk"}

data: {"choices":[{"finish_reason":null,"index":0,"delta":{"tool_calls":[{"index":1,"function":{"arguments":"\":"}}]}}],"created":1760284181,"id":"chatcmpl-z4nmrl2Fl7EtbFAC0mXVuI4Ol1OnksvQ","model":"MoE-Granite-4.0-h-small-32B","system_fingerprint":"b6761-464e1dce2","object":"chat.completion.chunk"}

data: {"choices":[{"finish_reason":null,"index":0,"delta":{"tool_calls":[{"index":1,"function":{"arguments":"4}"}}]}}],"created":1760284181,"id":"chatcmpl-z4nmrl2Fl7EtbFAC0mXVuI4Ol1OnksvQ","model":"MoE-Granite-4.0-h-small-32B","system_fingerprint":"b6761-464e1dce2","object":"chat.completion.chunk"}

data: {"choices":[{"finish_reason":"tool_calls","index":0,"delta":{}}],"created":1760284181,"id":"chatcmpl-z4nmrl2Fl7EtbFAC0mXVuI4Ol1OnksvQ","model":"MoE-Granite-4.0-h-small-32B","system_fingerprint":"b6761-464e1dce2","object":"chat.completion.chunk","timings":{"cache_n":0,"prompt_n":305,"prompt_ms":181.719,"prompt_per_token_ms":0.5958,"prompt_per_second":1678.4155756965424,"predicted_n":50,"predicted_ms":559.107,"predicted_per_token_ms":11.182139999999999,"predicted_per_second":89.428320518255}}

data: [DONE]

Oct 12 '25 15:10 ServeurpersoCom

It’s a bit tricky to prove, since the responses look almost identical : the only clear evidence I have is the template detection line and the change in prompt length:

"prompt_n": 64,

"prompt_n": 305 That’s the best indicator that the Granite template is now being properly detected and applied instead of falling back to Hermes. But that’s not enough for me to build a proper test suite yet, nor to clearly demonstrate a perceptible improvement in behavior!

Oct 12 '25 16:10 ServeurpersoCom

This issue was closed because it has been inactive for 14 days since being marked as stale.

Nov 27 '25 01:11 github-actions[bot]