Eval bug: Granite 4 template detection fails
Name and Version
ilintar@LinuksowaJaskinia:/mnt/win/k/models/unsloth/granite-4.0-h-small-GGUF$ llama-cli --version load_backend: loaded BLAS backend from /devel/tools/llama.cpp/build/bin/libggml-blas.so register_backend: registered backend BLAS (1 devices) register_device: registered device BLAS (OpenBLAS) ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 2 CUDA devices: Device 0: NVIDIA GeForce RTX 3080, compute capability 8.6, VMM: yes Device 1: NVIDIA GeForce RTX 5060 Ti, compute capability 12.0, VMM: yes load_backend: loaded CUDA backend from /devel/tools/llama.cpp/build/bin/libggml-cuda.so register_backend: registered backend CUDA (2 devices) register_device: registered device CUDA0 (NVIDIA GeForce RTX 3080) register_device: registered device CUDA1 (NVIDIA GeForce RTX 5060 Ti) ggml_backend_load_best: /devel/tools/llama.cpp/build/bin/libggml-cpu-icelake.so score: 0 ggml_backend_load_best: /devel/tools/llama.cpp/build/bin/libggml-cpu-skylakex.so score: 0 ggml_backend_load_best: /devel/tools/llama.cpp/build/bin/libggml-cpu-haswell.so score: 64 ggml_backend_load_best: /devel/tools/llama.cpp/build/bin/libggml-cpu-sse42.so score: 5 ggml_backend_load_best: /devel/tools/llama.cpp/build/bin/libggml-cpu-sandybridge.so score: 21 ggml_backend_load_best: /devel/tools/llama.cpp/build/bin/libggml-cpu-alderlake.so score: 0 ggml_backend_load_best: /devel/tools/llama.cpp/build/bin/libggml-cpu-x64.so score: 1 ggml_backend_load_best: /devel/tools/llama.cpp/build/bin/libggml-cpu-sapphirerapids.so score: 0 load_backend: loaded CPU backend from /devel/tools/llama.cpp/build/bin/libggml-cpu-haswell.so register_backend: registered backend CPU (1 devices) register_device: registered device CPU (Intel(R) Core(TM) i7-9700K CPU @ 3.60GHz) version: 6539 (7ec2df64a) built with cc (Ubuntu 14.2.0-19ubuntu2) 14.2.0 for x86_64-linux-gnu
Operating systems
Linux
GGML backends
CPU
Hardware
i7-9700K + GTX 3080 + GTX 5060 Ti
Models
granite-4.0-h-small-GGUF-Q8_0
Problem description & steps to reproduce
The Granite 4 Hybrid model doesn't seem to recognize its own chat template, leading to a crash later on when it tries to use tools. The template detection string in llama-chat.cpp is too specific, seems to only match one of the old templates for the Tiny model.
First Bad Commit
No response
Relevant log output
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
__syscall_cancel_arch () at ../sysdeps/unix/sysv/linux/x86_64/syscall_cancel.S:56
warning: 56 ../sysdeps/unix/sysv/linux/x86_64/syscall_cancel.S: No such file or directory
#0 __syscall_cancel_arch () at ../sysdeps/unix/sysv/linux/x86_64/syscall_cancel.S:56
56 in ../sysdeps/unix/sysv/linux/x86_64/syscall_cancel.S
#1 0x00007b8da2e9eb63 in __internal_syscall_cancel (a1=<optimized out>, a2=<optimized out>, a3=<optimized out>, a4=<optimized out>, a5=0, a6=0, nr=61) at ./nptl/cancellation.c:49
warning: 49 ./nptl/cancellation.c: No such file or directory
#2 __syscall_cancel (a1=<optimized out>, a2=<optimized out>, a3=<optimized out>, a4=<optimized out>, a5=a5@entry=0, a6=a6@entry=0, nr=61) at ./nptl/cancellation.c:75
75 in ./nptl/cancellation.c
#3 0x00007b8da2f1ae9f in __GI___wait4 (pid=<optimized out>, stat_loc=<optimized out>, options=<optimized out>, usage=<optimized out>) at ../sysdeps/unix/sysv/linux/wait4.c:30
warning: 30 ../sysdeps/unix/sysv/linux/wait4.c: No such file or directory
#4 0x00007b8da3546b71 in ggml_print_backtrace () at /devel/tools/llama.cpp/ggml/src/ggml.c:196
196 waitpid(child_pid, NULL, 0);
#5 0x00007b8da355c393 in ggml_uncaught_exception () at /devel/tools/llama.cpp/ggml/src/ggml.cpp:9
9 ggml_print_backtrace();
#6 0x00007b8da32c10aa in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
#7 0x00007b8da32aaa9e in std::terminate() () from /lib/x86_64-linux-gnu/libstdc++.so.6
#8 0x00007b8da32c1361 in __cxa_throw () from /lib/x86_64-linux-gnu/libstdc++.so.6
#9 0x00005886af1bde71 in nlohmann::json_abi_v3_12_0::detail::json_sax_dom_parser<nlohmann::json_abi_v3_12_0::basic_json<nlohmann::json_abi_v3_12_0::ordered_map, std::vector, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool, long, unsigned long, double, std::allocator, nlohmann::json_abi_v3_12_0::adl_serializer, std::vector<unsigned char, std::allocator<unsigned char> >, void>, nlohmann::json_abi_v3_12_0::detail::iterator_input_adapter<__gnu_cxx::__normal_iterator<char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >::parse_error<nlohmann::json_abi_v3_12_0::detail::parse_error> (this=0x7ffd12a16260, ex=...) at /devel/tools/llama.cpp/common/../vendor/nlohmann/json.hpp:8983
8983 JSON_THROW(ex);
#10 0x00005886af1bd899 in nlohmann::json_abi_v3_12_0::detail::parser<nlohmann::json_abi_v3_12_0::basic_json<nlohmann::json_abi_v3_12_0::ordered_map, std::vector, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool, long, unsigned long, double, std::allocator, nlohmann::json_abi_v3_12_0::adl_serializer, std::vector<unsigned char, std::allocator<unsigned char> >, void>, nlohmann::json_abi_v3_12_0::detail::iterator_input_adapter<__gnu_cxx::__normal_iterator<char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >::sax_parse_internal<nlohmann::json_abi_v3_12_0::detail::json_sax_dom_parser<nlohmann::json_abi_v3_12_0::basic_json<nlohmann::json_abi_v3_12_0::ordered_map, std::vector, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool, long, unsigned long, double, std::allocator, nlohmann::json_abi_v3_12_0::adl_serializer, std::vector<unsigned char, std::allocator<unsigned char> >, void>, nlohmann::json_abi_v3_12_0::detail::iterator_input_adapter<__gnu_cxx::__normal_iterator<char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > > > (this=0x7ffd12a16460, sax=0x7ffd12a16260) at /devel/tools/llama.cpp/common/../vendor/nlohmann/json.hpp:13324
13324 return sax->parse_error(m_lexer.get_position(),
#11 0x00005886af19a2c9 in nlohmann::json_abi_v3_12_0::detail::parser<nlohmann::json_abi_v3_12_0::basic_json<nlohmann::json_abi_v3_12_0::ordered_map, std::vector, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool, long, unsigned long, double, std::allocator, nlohmann::json_abi_v3_12_0::adl_serializer, std::vector<unsigned char, std::allocator<unsigned char> >, void>, nlohmann::json_abi_v3_12_0::detail::iterator_input_adapter<__gnu_cxx::__normal_iterator<char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >::parse (this=0x7ffd12a16460, strict=true, result=...) at /devel/tools/llama.cpp/common/../vendor/nlohmann/json.hpp:12984
12984 sax_parse_internal(&sdp);
#12 0x00005886af277ded in nlohmann::json_abi_v3_12_0::basic_json<nlohmann::json_abi_v3_12_0::ordered_map, std::vector, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool, long, unsigned long, double, std::allocator, nlohmann::json_abi_v3_12_0::adl_serializer, std::vector<unsigned char, std::allocator<unsigned char> >, void>::parse<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&> (i="\n{\"name\": \"create_text_file\", \"arguments\": {\"relative_path\":\"/devel/alt/random/isorpg/src/main.js\",\"content\":\"// main.js \\u2013 entry point for the Three.js Isometric RPG\\n\\n// Import core Three.js mo"..., cb=..., allow_exceptions=true, ignore_comments=false) at /devel/tools/llama.cpp/common/../vendor/nlohmann/json.hpp:24063
24063 parser(detail::input_adapter(std::forward<InputType>(i)), std::move(cb), allow_exceptions, ignore_comments).parse(true, result); // cppcheck-suppress[accessMoved,accessForwarded]
#13 0x00005886af3d7be6 in common_json_parse (it=10 '\n', end=0 '\000', healing_marker="1910210050", out=...) at /devel/tools/llama.cpp/common/json-partial.cpp:245
245 out.json = json::parse(str);
#14 0x00005886af3d19ff in common_chat_msg_parser::try_consume_json (this=0x7ffd12a16e80) at /devel/tools/llama.cpp/common/chat-parser.cpp:237
237 if (!common_json_parse(it, end, healing_marker_, result)) {
#15 0x00005886af3d2ce3 in common_chat_msg_parser::try_consume_json_with_dumped_args (this=0x7ffd12a16e80, args_paths=std::vector of length 1, capacity 1 = {...}, content_paths=std::vector of length 0, capacity 0) at /devel/tools/llama.cpp/common/chat-parser.cpp:272
272 auto partial = try_consume_json();
#16 0x00005886af2d8674 in common_chat_parse_hermes_2_pro (builder=...) at /devel/tools/llama.cpp/common/chat.cpp:2109
2109 if (auto tool_call = builder.try_consume_json_with_dumped_args({{"arguments"}})) {
#17 0x00005886af2e05fc in common_chat_parse (builder=...) at /devel/tools/llama.cpp/common/chat.cpp:2686
2686 common_chat_parse_hermes_2_pro(builder);
#18 0x00005886af2e083b in common_chat_parse (input="<tool_call>\n{\"name\": \"create_text_file\", \"arguments\": {\"relative_path\":\"/devel/alt/random/isorpg/src/main.js\",\"content\":\"// main.js \\u2013 entry point for the Three.js Isometric RPG\\n\\n// Import core "..., is_partial=true, syntax=...) at /devel/tools/llama.cpp/common/chat.cpp:2715
2715 common_chat_parse(builder);
#19 0x00005886af13567c in server_slot::update_chat_msg (this=0x5886d39e9220, diffs=std::vector of length 0, capacity 0) at /devel/tools/llama.cpp/tools/server/server.cpp:1620
1620 params.oaicompat_chat_syntax);
#20 0x00005886af142443 in server_context::send_partial_response (this=0x7ffd12a1a230, slot=..., tkn=..., is_progress=false) at /devel/tools/llama.cpp/tools/server/server.cpp:2776
2776 slot.update_chat_msg(res->oaicompat_msg_diffs);
#21 0x00005886af140ff6 in server_context::process_token (this=0x7ffd12a1a230, result=..., slot=...) at /devel/tools/llama.cpp/tools/server/server.cpp:2568
2568 send_partial_response(slot, result, false);
#22 0x00005886af1499e1 in server_context::update_slots (this=0x7ffd12a1a230) at /devel/tools/llama.cpp/tools/server/server.cpp:3922
3922 if (!process_token(result, slot)) {
#23 0x00005886af0e5b13 in operator() (__closure=0x7ffd12a1b9a0) at /devel/tools/llama.cpp/tools/server/server.cpp:5384
5384 ctx_server.update_slots();
#24 0x00005886af0f492a in std::__invoke_impl<void, main(int, char**)::<lambda()>&>(std::__invoke_other, struct {...} &) (__f=...) at /usr/include/c++/14/bits/invoke.h:61
61 { return std::forward<_Fn>(__f)(std::forward<_Args>(__args)...); }
#25 0x00005886af0f26f0 in std::__invoke_r<void, main(int, char**)::<lambda()>&>(struct {...} &) (__fn=...) at /usr/include/c++/14/bits/invoke.h:111
111 std::__invoke_impl<__type>(__tag{}, std::forward<_Callable>(__fn),
#26 0x00005886af0ee373 in std::_Function_handler<void(), main(int, char**)::<lambda()> >::_M_invoke(const std::_Any_data &) (__functor=...) at /usr/include/c++/14/bits/std_function.h:290
290 return std::__invoke_r<_Res>(*_Base::_M_get_pointer(__functor),
#27 0x00005886af14fdc4 in std::function<void()>::operator() (this=0x7ffd12a1b9a0) at /usr/include/c++/14/bits/std_function.h:591
591 return _M_invoker(_M_functor, std::forward<_ArgTypes>(__args)...);
#28 0x00005886af138f8d in server_queue::start_loop (this=0x7ffd12a1b880) at /devel/tools/llama.cpp/tools/server/server.cpp:1918
1918 callback_update_slots();
#29 0x00005886af0e835e in main (argc=10, argv=0x7ffd12a1bc88) at /devel/tools/llama.cpp/tools/server/server.cpp:5411
5411 ctx_server.queue_tasks.start_loop();
My tool-calling works fine on MoE-Granite-4.0-h-small-32B, but since I currently use a custom proxy that handles the parsing myself, I’d much rather understand and leverage the native tool-calling logic in llama.cpp. That way I can drop my SSE proxy entirely and help fix or improve the community’s parsing code instead of maintaining my own. I’ll take some time to read through the code and understand how it works internally.
The current Granite codepath in llama.cpp is incomplete: template detection fails and falls back to the Hermes 2 Pro parser. When this happens, the runtime applies Hermes-style parsing rules to Granite prompts, which expect <tool_call> objects instead of Granite’s <|tool_call|> JSON array. As a result, any partial generation containing <|tool_call|>[{…}] triggers a nlohmann::json parse error and crashes the stream before server_slot::update_chat_msg() can assign tool-call IDs.
My proxy is just an intermediate OpenAI-compatible client : it doesn’t change or reinterpret tool-calling logic, it simply forwards requests and SSE chunks. I’m keeping it for now, but I’m trying to understand why tool calls work perfectly with Granite on my setup, while the same model sometimes fails under llama-server. It might be related to SSE mode working fine while non-streaming requests could trigger the fallback... I’ll test that, since detection should follow the same path.
Special characters (including Unicode and emoji) pass correctly in tool calls. But it don't like the \u2013 !
The current Granite codepath in llama.cpp is incomplete: template detection fails and falls back to the Hermes 2 Pro parser. When this happens, the runtime applies Hermes-style parsing rules to Granite prompts, which expect <tool_call> objects instead of Granite’s <|tool_call|> JSON array. As a result, any partial generation containing <|tool_call|>[{…}] triggers a nlohmann::json parse error and crashes the stream before server_slot::update_chat_msg() can assign tool-call IDs.
Yep. Just need to fix the template detection code to use stuff specific to the new Granite templates as well (would do it, but busy with you-know-what).
Turns out the entire Granite 4 detection issue was just missing this tiny condition block ! unbelievable. I’ll test it thoroughly and add it to the validation suite!
(root|~/llama.cpp.pascal) git diff
diff --git a/common/chat.cpp b/common/chat.cpp
index 8587140e1..37f091ec3 100644
--- a/common/chat.cpp
+++ b/common/chat.cpp
@@ -2719,7 +2719,10 @@ static common_chat_params common_chat_templates_apply_jinja(
}
// Granite (IBM) - detects thinking / tools support
- if (src.find("elif thinking") != std::string::npos && src.find("<|tool_call|>") != std::string::npos) {
+ if (src.find("<|tool_call|>") != std::string::npos &&
+ (src.find("elif thinking") != std::string::npos ||
+ src.find("tools_system_message_prefix") != std::string::npos ||
+ src.find("You may call one or more tools to assist with the user query.") != std::string::npos)) {
return common_chat_params_init_granite(tmpl, params);
}
diff --git a/tools/server/public/index.html.gz b/tools/server/public/index.html.gz
index d0c44534e..bb2f3897a 100644
Binary files a/tools/server/public/index.html.gz and b/tools/server/public/index.html.gz differ
I realize now that my setup still goes through llama-server, but my tool calls were simple enough that the fallback Hermes parser never hit malformed JSON. Granite produced clean, single-block tool calls without Unicode escapes or partial JSON chunks, so llama-server didn’t crash even though it misdetected the template. The issue only surfaces when Granite streams <|tool_call|>[{...}] with escaped Unicode or fragmented JSON : that’s when the wrong parser blows up. But now that #16526 has been merged, the fix ensures Granite detection works properly and everything stays clean.
https://huggingface.co/ibm-granite/granite-4.0-h-small?chat_template=default
Granite actually uses plain XML-style tags (<tool_call>...</tool_call>) instead of the Llama-style <|tool_call|>. If we don’t properly distinguish the two formats, the Granite condition will end up matching Hermes as well effectively “absorbing” Hermes detection because of its position earlier in the code.
Chat format detection is done based on keywords in the template though, not on the matching fragments of the actual chat :)
Yeah, I'm aware the chat format detection is keyword-based in template, not driven by the message fragments 🙂 What I'm trying to find now is a reproducible test that clearly shows the behavioral difference between a llama-server falling back to Hermes and a patched one correctly detecting Granite. I suspect a curl test with two parallel tool calls in a single completion should expose it : Granite handles multi-tool output properly, while Hermes tends to flatten or merge them.
Server logs chat format for messages by default, if I remember correctly. Unless you really need to detect it client-side.
No patch :
srv params_from_: Chat format: Hermes 2 Pro
curl https://www.serveurperso.com/ia/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Accept: text/event-stream" \
-d '{
"model": "MoE-Granite-4.0-h-small-32B",
"stream": true,
"parallel_tool_calls": true,
"messages": [
{
"role": "system",
"content": "Follow the Granite tool-call format exactly, including the <|tool_call|> JSON list when you invoke more than one tool."
},
{
"role": "user",
"content": "Call add with x=1,y=2 and multiply with a=3,b=4; respond only with the tool list."
}
],
"tools": [
{
"type": "function",
"function": {
"name": "add",
"description": "Add two integers",
"parameters": {
"type": "object",
"properties": {
"x": { "type": "integer" },
"y": { "type": "integer" }
},
"required": ["x", "y"]
}
}
},
{
"type": "function",
"function": {
"name": "multiply",
"description": "Multiply two integers",
"parameters": {
"type": "object",
"properties": {
"a": { "type": "integer" },
"b": { "type": "integer" }
},
"required": ["a", "b"]
}
}
}
],
"tool_choice": "auto"
}'
data: {"choices":[{"finish_reason":null,"index":0,"delta":{"role":"assistant","content":null}}],"created":1760283726,"id":"chatcmpl-Z5GPRuP3Dajix4mkcVYXjuZuHflOjZUv","model":"MoE-Granite-4.0-h-small-32B","system_fingerprint":"b6761-464e1dce2","object":"chat.completion.chunk"}
data: {"choices":[{"finish_reason":null,"index":0,"delta":{"tool_calls":[{"index":0,"id":"SAh7r2xqDVkPbRkoifqjOiHU2O8pIDpt","type":"function","function":{"name":"add","arguments":""}}]}}],"created":1760283726,"id":"chatcmpl-Z5GPRuP3Dajix4mkcVYXjuZuHflOjZUv","model":"MoE-Granite-4.0-h-small-32B","system_fingerprint":"b6761-464e1dce2","object":"chat.completion.chunk"}
data: {"choices":[{"finish_reason":null,"index":0,"delta":{"tool_calls":[{"index":0,"function":{"arguments":"{\""}}]}}],"created":1760283726,"id":"chatcmpl-Z5GPRuP3Dajix4mkcVYXjuZuHflOjZUv","model":"MoE-Granite-4.0-h-small-32B","system_fingerprint":"b6761-464e1dce2","object":"chat.completion.chunk"}
data: {"choices":[{"finish_reason":null,"index":0,"delta":{"tool_calls":[{"index":0,"function":{"arguments":"x"}}]}}],"created":1760283726,"id":"chatcmpl-Z5GPRuP3Dajix4mkcVYXjuZuHflOjZUv","model":"MoE-Granite-4.0-h-small-32B","system_fingerprint":"b6761-464e1dce2","object":"chat.completion.chunk"}
data: {"choices":[{"finish_reason":null,"index":0,"delta":{"tool_calls":[{"index":0,"function":{"arguments":"\":"}}]}}],"created":1760283726,"id":"chatcmpl-Z5GPRuP3Dajix4mkcVYXjuZuHflOjZUv","model":"MoE-Granite-4.0-h-small-32B","system_fingerprint":"b6761-464e1dce2","object":"chat.completion.chunk"}
data: {"choices":[{"finish_reason":null,"index":0,"delta":{"tool_calls":[{"index":0,"function":{"arguments":"1,"}}]}}],"created":1760283726,"id":"chatcmpl-Z5GPRuP3Dajix4mkcVYXjuZuHflOjZUv","model":"MoE-Granite-4.0-h-small-32B","system_fingerprint":"b6761-464e1dce2","object":"chat.completion.chunk"}
data: {"choices":[{"finish_reason":null,"index":0,"delta":{"tool_calls":[{"index":0,"function":{"arguments":"\""}}]}}],"created":1760283726,"id":"chatcmpl-Z5GPRuP3Dajix4mkcVYXjuZuHflOjZUv","model":"MoE-Granite-4.0-h-small-32B","system_fingerprint":"b6761-464e1dce2","object":"chat.completion.chunk"}
data: {"choices":[{"finish_reason":null,"index":0,"delta":{"tool_calls":[{"index":0,"function":{"arguments":"y"}}]}}],"created":1760283726,"id":"chatcmpl-Z5GPRuP3Dajix4mkcVYXjuZuHflOjZUv","model":"MoE-Granite-4.0-h-small-32B","system_fingerprint":"b6761-464e1dce2","object":"chat.completion.chunk"}
data: {"choices":[{"finish_reason":null,"index":0,"delta":{"tool_calls":[{"index":0,"function":{"arguments":"\":"}}]}}],"created":1760283726,"id":"chatcmpl-Z5GPRuP3Dajix4mkcVYXjuZuHflOjZUv","model":"MoE-Granite-4.0-h-small-32B","system_fingerprint":"b6761-464e1dce2","object":"chat.completion.chunk"}
data: {"choices":[{"finish_reason":null,"index":0,"delta":{"tool_calls":[{"index":0,"function":{"arguments":"2}"}}]}}],"created":1760283726,"id":"chatcmpl-Z5GPRuP3Dajix4mkcVYXjuZuHflOjZUv","model":"MoE-Granite-4.0-h-small-32B","system_fingerprint":"b6761-464e1dce2","object":"chat.completion.chunk"}
data: {"choices":[{"finish_reason":null,"index":0,"delta":{"tool_calls":[{"index":1,"id":"I9M7sgscQcnWIH4wIG8LEp2SjhkC9lgk","type":"function","function":{"name":"multiply","arguments":""}}]}}],"created":1760283727,"id":"chatcmpl-Z5GPRuP3Dajix4mkcVYXjuZuHflOjZUv","model":"MoE-Granite-4.0-h-small-32B","system_fingerprint":"b6761-464e1dce2","object":"chat.completion.chunk"}
data: {"choices":[{"finish_reason":null,"index":0,"delta":{"tool_calls":[{"index":1,"function":{"arguments":"{\""}}]}}],"created":1760283727,"id":"chatcmpl-Z5GPRuP3Dajix4mkcVYXjuZuHflOjZUv","model":"MoE-Granite-4.0-h-small-32B","system_fingerprint":"b6761-464e1dce2","object":"chat.completion.chunk"}
data: {"choices":[{"finish_reason":null,"index":0,"delta":{"tool_calls":[{"index":1,"function":{"arguments":"a"}}]}}],"created":1760283727,"id":"chatcmpl-Z5GPRuP3Dajix4mkcVYXjuZuHflOjZUv","model":"MoE-Granite-4.0-h-small-32B","system_fingerprint":"b6761-464e1dce2","object":"chat.completion.chunk"}
data: {"choices":[{"finish_reason":null,"index":0,"delta":{"tool_calls":[{"index":1,"function":{"arguments":"\":"}}]}}],"created":1760283727,"id":"chatcmpl-Z5GPRuP3Dajix4mkcVYXjuZuHflOjZUv","model":"MoE-Granite-4.0-h-small-32B","system_fingerprint":"b6761-464e1dce2","object":"chat.completion.chunk"}
data: {"choices":[{"finish_reason":null,"index":0,"delta":{"tool_calls":[{"index":1,"function":{"arguments":"3,"}}]}}],"created":1760283727,"id":"chatcmpl-Z5GPRuP3Dajix4mkcVYXjuZuHflOjZUv","model":"MoE-Granite-4.0-h-small-32B","system_fingerprint":"b6761-464e1dce2","object":"chat.completion.chunk"}
data: {"choices":[{"finish_reason":null,"index":0,"delta":{"tool_calls":[{"index":1,"function":{"arguments":"\""}}]}}],"created":1760283727,"id":"chatcmpl-Z5GPRuP3Dajix4mkcVYXjuZuHflOjZUv","model":"MoE-Granite-4.0-h-small-32B","system_fingerprint":"b6761-464e1dce2","object":"chat.completion.chunk"}
data: {"choices":[{"finish_reason":null,"index":0,"delta":{"tool_calls":[{"index":1,"function":{"arguments":"b"}}]}}],"created":1760283727,"id":"chatcmpl-Z5GPRuP3Dajix4mkcVYXjuZuHflOjZUv","model":"MoE-Granite-4.0-h-small-32B","system_fingerprint":"b6761-464e1dce2","object":"chat.completion.chunk"}
data: {"choices":[{"finish_reason":null,"index":0,"delta":{"tool_calls":[{"index":1,"function":{"arguments":"\":"}}]}}],"created":1760283727,"id":"chatcmpl-Z5GPRuP3Dajix4mkcVYXjuZuHflOjZUv","model":"MoE-Granite-4.0-h-small-32B","system_fingerprint":"b6761-464e1dce2","object":"chat.completion.chunk"}
data: {"choices":[{"finish_reason":null,"index":0,"delta":{"tool_calls":[{"index":1,"function":{"arguments":"4}"}}]}}],"created":1760283727,"id":"chatcmpl-Z5GPRuP3Dajix4mkcVYXjuZuHflOjZUv","model":"MoE-Granite-4.0-h-small-32B","system_fingerprint":"b6761-464e1dce2","object":"chat.completion.chunk"}
data: {"choices":[{"finish_reason":"tool_calls","index":0,"delta":{}}],"created":1760283727,"id":"chatcmpl-Z5GPRuP3Dajix4mkcVYXjuZuHflOjZUv","model":"MoE-Granite-4.0-h-small-32B","system_fingerprint":"b6761-464e1dce2","object":"chat.completion.chunk","timings":{"cache_n":241,"prompt_n":64,"prompt_ms":67.228,"prompt_per_token_ms":1.0504375,"prompt_per_second":951.9842922591778,"predicted_n":50,"predicted_ms":557.635,"predicted_per_token_ms":11.1527,"predicted_per_second":89.66438620244426}}
data: [DONE]
With (new experimental) patch :
srv params_from_: Chat format: Granite
curl https://www.serveurperso.com/ia/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Accept: text/event-stream" \
-d '{
"model": "MoE-Granite-4.0-h-small-32B",
"stream": true,
"parallel_tool_calls": true,
"messages": [
{
"role": "system",
"content": "Follow the Granite tool-call format exactly, including the <|tool_call|> JSON list when you invoke more than one tool."
},
{
"role": "user",
"content": "Call add with x=1,y=2 and multiply with a=3,b=4; respond only with the tool list."
}
],
"tools": [
{
"type": "function",
"function": {
"name": "add",
"description": "Add two integers",
"parameters": {
"type": "object",
"properties": {
"x": { "type": "integer" },
"y": { "type": "integer" }
},
"required": ["x", "y"]
}
}
},
{
"type": "function",
"function": {
"name": "multiply",
"description": "Multiply two integers",
"parameters": {
"type": "object",
"properties": {
"a": { "type": "integer" },
"b": { "type": "integer" }
},
"required": ["a", "b"]
}
}
}
],
"tool_choice": "auto"
}'
data: {"choices":[{"finish_reason":null,"index":0,"delta":{"role":"assistant","content":null}}],"created":1760284180,"id":"chatcmpl-z4nmrl2Fl7EtbFAC0mXVuI4Ol1OnksvQ","model":"MoE-Granite-4.0-h-small-32B","system_fingerprint":"b6761-464e1dce2","object":"chat.completion.chunk"}
data: {"choices":[{"finish_reason":null,"index":0,"delta":{"tool_calls":[{"index":0,"id":"D9q4DM8rZFthCzDN4CRwgxzuRDu9S4Z4","type":"function","function":{"name":"add","arguments":""}}]}}],"created":1760284180,"id":"chatcmpl-z4nmrl2Fl7EtbFAC0mXVuI4Ol1OnksvQ","model":"MoE-Granite-4.0-h-small-32B","system_fingerprint":"b6761-464e1dce2","object":"chat.completion.chunk"}
data: {"choices":[{"finish_reason":null,"index":0,"delta":{"tool_calls":[{"index":0,"function":{"arguments":"{\""}}]}}],"created":1760284180,"id":"chatcmpl-z4nmrl2Fl7EtbFAC0mXVuI4Ol1OnksvQ","model":"MoE-Granite-4.0-h-small-32B","system_fingerprint":"b6761-464e1dce2","object":"chat.completion.chunk"}
data: {"choices":[{"finish_reason":null,"index":0,"delta":{"tool_calls":[{"index":0,"function":{"arguments":"x"}}]}}],"created":1760284181,"id":"chatcmpl-z4nmrl2Fl7EtbFAC0mXVuI4Ol1OnksvQ","model":"MoE-Granite-4.0-h-small-32B","system_fingerprint":"b6761-464e1dce2","object":"chat.completion.chunk"}
data: {"choices":[{"finish_reason":null,"index":0,"delta":{"tool_calls":[{"index":0,"function":{"arguments":"\":"}}]}}],"created":1760284181,"id":"chatcmpl-z4nmrl2Fl7EtbFAC0mXVuI4Ol1OnksvQ","model":"MoE-Granite-4.0-h-small-32B","system_fingerprint":"b6761-464e1dce2","object":"chat.completion.chunk"}
data: {"choices":[{"finish_reason":null,"index":0,"delta":{"tool_calls":[{"index":0,"function":{"arguments":"1,"}}]}}],"created":1760284181,"id":"chatcmpl-z4nmrl2Fl7EtbFAC0mXVuI4Ol1OnksvQ","model":"MoE-Granite-4.0-h-small-32B","system_fingerprint":"b6761-464e1dce2","object":"chat.completion.chunk"}
data: {"choices":[{"finish_reason":null,"index":0,"delta":{"tool_calls":[{"index":0,"function":{"arguments":"\""}}]}}],"created":1760284181,"id":"chatcmpl-z4nmrl2Fl7EtbFAC0mXVuI4Ol1OnksvQ","model":"MoE-Granite-4.0-h-small-32B","system_fingerprint":"b6761-464e1dce2","object":"chat.completion.chunk"}
data: {"choices":[{"finish_reason":null,"index":0,"delta":{"tool_calls":[{"index":0,"function":{"arguments":"y"}}]}}],"created":1760284181,"id":"chatcmpl-z4nmrl2Fl7EtbFAC0mXVuI4Ol1OnksvQ","model":"MoE-Granite-4.0-h-small-32B","system_fingerprint":"b6761-464e1dce2","object":"chat.completion.chunk"}
data: {"choices":[{"finish_reason":null,"index":0,"delta":{"tool_calls":[{"index":0,"function":{"arguments":"\":"}}]}}],"created":1760284181,"id":"chatcmpl-z4nmrl2Fl7EtbFAC0mXVuI4Ol1OnksvQ","model":"MoE-Granite-4.0-h-small-32B","system_fingerprint":"b6761-464e1dce2","object":"chat.completion.chunk"}
data: {"choices":[{"finish_reason":null,"index":0,"delta":{"tool_calls":[{"index":0,"function":{"arguments":"2}"}}]}}],"created":1760284181,"id":"chatcmpl-z4nmrl2Fl7EtbFAC0mXVuI4Ol1OnksvQ","model":"MoE-Granite-4.0-h-small-32B","system_fingerprint":"b6761-464e1dce2","object":"chat.completion.chunk"}
data: {"choices":[{"finish_reason":null,"index":0,"delta":{"tool_calls":[{"index":1,"id":"09co4vbs8aMyaAAHXPsBYCW1uZyepv0A","type":"function","function":{"name":"multiply","arguments":""}}]}}],"created":1760284181,"id":"chatcmpl-z4nmrl2Fl7EtbFAC0mXVuI4Ol1OnksvQ","model":"MoE-Granite-4.0-h-small-32B","system_fingerprint":"b6761-464e1dce2","object":"chat.completion.chunk"}
data: {"choices":[{"finish_reason":null,"index":0,"delta":{"tool_calls":[{"index":1,"function":{"arguments":"{\""}}]}}],"created":1760284181,"id":"chatcmpl-z4nmrl2Fl7EtbFAC0mXVuI4Ol1OnksvQ","model":"MoE-Granite-4.0-h-small-32B","system_fingerprint":"b6761-464e1dce2","object":"chat.completion.chunk"}
data: {"choices":[{"finish_reason":null,"index":0,"delta":{"tool_calls":[{"index":1,"function":{"arguments":"a"}}]}}],"created":1760284181,"id":"chatcmpl-z4nmrl2Fl7EtbFAC0mXVuI4Ol1OnksvQ","model":"MoE-Granite-4.0-h-small-32B","system_fingerprint":"b6761-464e1dce2","object":"chat.completion.chunk"}
data: {"choices":[{"finish_reason":null,"index":0,"delta":{"tool_calls":[{"index":1,"function":{"arguments":"\":"}}]}}],"created":1760284181,"id":"chatcmpl-z4nmrl2Fl7EtbFAC0mXVuI4Ol1OnksvQ","model":"MoE-Granite-4.0-h-small-32B","system_fingerprint":"b6761-464e1dce2","object":"chat.completion.chunk"}
data: {"choices":[{"finish_reason":null,"index":0,"delta":{"tool_calls":[{"index":1,"function":{"arguments":"3,"}}]}}],"created":1760284181,"id":"chatcmpl-z4nmrl2Fl7EtbFAC0mXVuI4Ol1OnksvQ","model":"MoE-Granite-4.0-h-small-32B","system_fingerprint":"b6761-464e1dce2","object":"chat.completion.chunk"}
data: {"choices":[{"finish_reason":null,"index":0,"delta":{"tool_calls":[{"index":1,"function":{"arguments":"\""}}]}}],"created":1760284181,"id":"chatcmpl-z4nmrl2Fl7EtbFAC0mXVuI4Ol1OnksvQ","model":"MoE-Granite-4.0-h-small-32B","system_fingerprint":"b6761-464e1dce2","object":"chat.completion.chunk"}
data: {"choices":[{"finish_reason":null,"index":0,"delta":{"tool_calls":[{"index":1,"function":{"arguments":"b"}}]}}],"created":1760284181,"id":"chatcmpl-z4nmrl2Fl7EtbFAC0mXVuI4Ol1OnksvQ","model":"MoE-Granite-4.0-h-small-32B","system_fingerprint":"b6761-464e1dce2","object":"chat.completion.chunk"}
data: {"choices":[{"finish_reason":null,"index":0,"delta":{"tool_calls":[{"index":1,"function":{"arguments":"\":"}}]}}],"created":1760284181,"id":"chatcmpl-z4nmrl2Fl7EtbFAC0mXVuI4Ol1OnksvQ","model":"MoE-Granite-4.0-h-small-32B","system_fingerprint":"b6761-464e1dce2","object":"chat.completion.chunk"}
data: {"choices":[{"finish_reason":null,"index":0,"delta":{"tool_calls":[{"index":1,"function":{"arguments":"4}"}}]}}],"created":1760284181,"id":"chatcmpl-z4nmrl2Fl7EtbFAC0mXVuI4Ol1OnksvQ","model":"MoE-Granite-4.0-h-small-32B","system_fingerprint":"b6761-464e1dce2","object":"chat.completion.chunk"}
data: {"choices":[{"finish_reason":"tool_calls","index":0,"delta":{}}],"created":1760284181,"id":"chatcmpl-z4nmrl2Fl7EtbFAC0mXVuI4Ol1OnksvQ","model":"MoE-Granite-4.0-h-small-32B","system_fingerprint":"b6761-464e1dce2","object":"chat.completion.chunk","timings":{"cache_n":0,"prompt_n":305,"prompt_ms":181.719,"prompt_per_token_ms":0.5958,"prompt_per_second":1678.4155756965424,"predicted_n":50,"predicted_ms":559.107,"predicted_per_token_ms":11.182139999999999,"predicted_per_second":89.428320518255}}
data: [DONE]
It’s a bit tricky to prove, since the responses look almost identical : the only clear evidence I have is the template detection line and the change in prompt length:
- "prompt_n": 64,
- "prompt_n": 305 That’s the best indicator that the Granite template is now being properly detected and applied instead of falling back to Hermes. But that’s not enough for me to build a proper test suite yet, nor to clearly demonstrate a perceptible improvement in behavior!
This issue was closed because it has been inactive for 14 days since being marked as stale.