llama.cpp common : implement parser combinators for chat parsing [WIP]

Putting this out there as a proof-of-concept and to gather feedback. It is still a WIP.

cc @pwilkin

Problem

Each model currently requires a custom parser to handle reasoning and tool calls. XML-based models are particularly challenging to parse. For example, Qwen3-Coder outputs:

<tool_call>
<function={name}>
<parameter={arg-name}>
{arg_value as json or string}
</parameter>
...
</function>
</tool_call>

Supporting this format requires the parser to know the type of each argument based on the provided schema.

Proposal

I propose using parser combinators to simplify parsing. We can compose parsers suitable for PEG grammars, which should handle model output effectively. This PR implements a proof-of-concept.

Here's an example from test/test-chat-parser-combinator.cpp:

// Parser for a fictitious model that outputs:
//
//   <think>
//   ... reasoning content ...
//   </think>
//   ... content ...
//   <tool_call>
//   <name>tool_name</name>
//   <args>{ ... json args ... }</args>
//   </tool_call>
//
auto parser = build_parser([](parser_builder & p) {
    auto reasoning = p.add_rule("reasoning",
        "<think>" << p.append_reasoning(p.until("</think>")) << "</think>");

    auto content = p.add_rule("content",
        p.append_content(p.until("<tool_call>")));

    auto json = p.json();

    auto tool_call_name = p.add_rule("tool-call-name",
        "<name>" << p.capture_tool_call_name(p.until("</name>")) << "</name>");

    auto schema = nlohmann::ordered_json::parse(R"({"type": "object"})");

    auto tool_call_args = p.add_rule("tool-call-args",
        "<args>" << p.capture_tool_call_args(p.schema(json, "get_weather", schema)) << "</args>");

    auto tool_call = p.add_rule("tool-call",
        "<tool_call>" << p.add_tool_call(tool_call_name << tool_call_args) << "</tool_call>");

    return reasoning << p.optional(content) << p.optional(tool_call);
});

// Test complete input
{
    std::string input = R"(<think>I need to call get_weather with city = New York</think><tool_call><name>get_weather</name><args>{"city": "New York"}</args></tool_call>)";
    parser_environment env;
    parser_context ctx(input, &env);

    auto result = parser.parse(ctx);

    assert_equals(true, result.is_success());
    assert_equals(input.size(), result.end);
    assert_equals("I need to call get_weather with city = New York", env.reasoning_content);
    assert_equals((size_t)1, env.tool_calls.size());
    assert_equals("", env.tool_calls[0].id);
    assert_equals("get_weather", env.tool_calls[0].name);
    assert_equals(R"({"city": "New York"})", env.tool_calls[0].arguments);
}

// Test partial input
{
    std::string input = R"(<think>I need to call get_weather )";
    parser_environment env = parser_environment();
    parser_context ctx = parser_context(input, &env, /* .is_input_complete = */ false);

    auto result = parser.parse(ctx);

    assert_equals(true, result.is_success());
    assert_equals("I need to call get_weather", env.reasoning_content);
}

The generated parse tree can be used to produce a GBNF grammar. The plan is to build the parser during chat param initialization and derive grammar rules with support for lazy triggers. This should support both tool_choice = auto and tool_choice = required.

array ::= "[" space ( value ("," space value)* )? "]" space
boolean ::= ("true" | "false") space
char ::= [^"\\\x7F\x00-\x1F] | [\\] (["\\bfnrt] | "u" [0-9a-fA-F]{4})
content ::= ([^<] | "<" [^t] | "<t" [^o] | "<to" [^o] | "<too" [^l] | "<tool" [^_] | "<tool_" [^c] | "<tool_c" [^a] | "<tool_ca" [^l] | "<tool_cal" [^l] | "<tool_call" [^>])*
decimal-part ::= [0-9]{1,16}
get-weather ::= object
integral-part ::= [0] | [1-9] [0-9]{0,15}
null ::= "null" space
number ::= ("-"? integral-part) ("." decimal-part)? ([eE] [-+]? integral-part)? space
object ::= "{" space ( string ":" space value ("," space string ":" space value)* )? "}" space
reasoning ::= "<think>" space ([^<] | "<" [^/] | "</" [^t] | "</t" [^h] | "</th" [^i] | "</thi" [^n] | "</thin" [^k] | "</think" [^>])* space "</think>"
root ::= reasoning space content? space tool-call?
space ::= | " " | "\n"{1,2} [ \t]{0,20}
string ::= "\"" char* "\"" space
tool-call ::= "<tool_call>" space tool-call-name space tool-call-args space "</tool_call>"
tool-call-args ::= "<args>" space get-weather space "</args>"
tool-call-name ::= "<name>" space ([^<] | "<" [^/] | "</" [^n] | "</n" [^a] | "</na" [^m] | "</nam" [^e] | "</name" [^>])* space "</name>"
value ::= object | array | string | number | boolean | null

Specifics

This PR implements parser combinators for PEG grammars. It uses caching to implement packrat parsing. The following are implemented:

Basic Parsers

literal(string) - Matches an exact literal string. S -> "hello"
any() - Matches any single character. S -> .
one(classes) - Matches a single character from a character class or range. S -> [a-z] or S -> [^0-9]
chars(classes, min, max) - Matches between min and max repetitions of characters from a character class. S -> [a-z]{m,n}. Use -1 for max to represent unbounded repetition {m,}

Operators

Parsers can be combined using operator overloading for convenient syntax:

~p - Negative lookahead, equivalent to negate(p). S -> !A
p1 + p2 - Sequence, matches p1 followed by p2, equivalent to sequence({p1, p2}). S -> A B
p1 | p2 - Choice, matches p1 or p2, equivalent to choice({p1, p2}). S -> A | B
p1 << p2 - Sequence with whitespace in between, equivalent to sequence({p1, space(), p2}). S -> A [ \t\n]* B

Operators also work with string literals on the left side:

"literal" + p - Sequence starting with a literal string
"literal" | p - Choice with a literal string as first alternative
"literal" << p - Literal followed by whitespace then parser

Combinators

sequence(parsers) - Matches a sequence of parsers in order, all must succeed. S -> A B C
choice(parsers) - Matches the first parser that succeeds from a list of alternatives. S -> A | B | C
one_or_more(p) - Matches one or more repetitions of a parser. S -> A+
zero_or_more(p) - Matches zero or more repetitions of a parser, always succeeds. S -> A*
optional(p) - Matches zero or one occurrence of a parser, always succeeds. S -> A?
repeat(p, min, max) - Matches between min and max repetitions of a parser (inclusive). S -> A{m,n}. Use -1 for max to represent unbounded repetition {m,}
repeat(p, n) - Matches exactly n repetitions of a parser. S -> A{n}
negate(p) - Negative lookahead: succeeds if child parser fails, consumes no input. S -> !A

Utility Parsers

space() - Matches zero or more whitespace characters (space, tab, newline). S -> [ \t\n]*
until(delimiter, consume_spaces) - Matches all characters until a delimiter is found (delimiter not consumed). S -> (!delim .)*
rule(name) - References a named rule for recursive or reusable grammar definitions. expr -> term | expr "+" term

JSON Parsers

json() - Creates a complete JSON parser supporting objects, arrays, strings, numbers, booleans, and null. value -> object | array | string | number | true | false | null
json_string() - Specialized single-pass JSON string parser with escape sequence handling
json_key(name, p) - JSON key-value parser for specific object fields

Semantic Actions

action(p, fn) - Wraps a parser with a semantic action callback. The callback is invoked on successful parse with the result, matched text, and environment. S -> A [action]
schema(p, name, schema) - Wraps a parser with JSON schema metadata for grammar generation. Used internally to convert JSON schemas to GBNF grammar rules.

Action Convenience Wrappers

append_reasoning(p) - Appends matched text to env.reasoning_content
append_content(p) - Appends matched text to env.content
capture(p, key, unescape_json) - Captures matched text to env.scratchpad[key]. If unescape_json is true, the matched text is unescaped as a JSON string
capture_tool_call_id(p, unescape_json) - Captures matched text to env.tool_call_id. If unescape_json is true, the matched text is unescaped as a JSON string
capture_tool_call_name(p, unescape_json) - Captures matched text to env.tool_call_name. If unescape_json is true, the matched text is unescaped as a JSON string
capture_tool_call_args(p, unescape_json) - Captures matched text to env.tool_call_args. If unescape_json is true, the matched text is unescaped as a JSON string
add_tool_call(p) - Adds a tool call to env.tool_calls using env.tool_call_{id,name,args}. Clears the tool call fields after adding

Rule Management

add_rule(name, p) - Adds a named rule to the grammar for reuse and recursion

The operators +, |, and ~ construct sequence, choice, and negate parsers respectively. The << operator includes a space rule between parsers.

Drawbacks

Parsers that match content while excluding certain patterns, such as end tags, have a less obvious syntax. For example, p.zero_or_more(~(space + p.literal("</think>")) + p.any()) matches any character that isn't followed by </think>. The p.until("</think>") parser is intended to simplify this.
Packrat parsing requires caching all intermediate parse results, which introduces memory overhead proportional to input size and grammar complexity
Each model still requires a custom parser, though they share a common framework that simplifies implementation
Parser combinators may offer less flexibility for handling malformed model output compared to hand-written parsers, though constrained decoding should prevent malformed tool calls

To do

[X] Basic implementation
[X] Support parsing of partial input for streaming
[X] Implement a JSON parser using parser combinators to replace the current healing system
[X] Implement append_content() and append_reasoning() semantic actions to populate content/reasoning fields.
[X] Implement add_tool_call(), capture_tool_call_name(), capture_tool_call_args() semantic actions to handle tool calls.
[X] Construct a GBNF grammar from the final parser
[ ] Construct a lazy GBNF grammar from the final parser
[X] Implement json-schema-to-grammar support. The JSON parser will parse any JSON, but the generated GBNF grammar should still be constructed from the user-provided schema.
[ ] Allow building of the parser during chat param initialization.

Nov 10 '25 03:11 aldehir

Yes! This is exactly what I was thinking about :) can you give me push writes to your repo so I can contribute without doing PRs to PRs?

Nov 10 '25 12:11 pwilkin

Yes! This is exactly what I was thinking about :) can you give me push writes to your repo so I can contribute without doing PRs to PRs?

Sure. I've never managed permissions on a GitHub repo, but let me know if you can't push.

The interface isn't solidified, so hammer away. I do want to clean up the header and move stuff into the source file. Figured I'd handle that as I get further along.

The partial parsing works, but does require careful attention if editing. The idea is to "succeed" if the parse tree is partially traversed and the input is marked as incomplete. With some caveats: if a literal is partially matched, it will propagate a result indicating we need more input. I intend to add a regex parser that uses the builtin partial regex matching support, which should do the same thing. This allows us to collect the results when sending a streaming response.

I need to clean up the caching. Initially, I thought, maybe we could reuse the cache as we get more and more input. I'm finding it very difficult to find the correct time to cache. So I'm thinking about nixing that idea and just provide a cache per parsing run--as the packrat algorithm originally intended. Then we can profile if caching is beneficial or not on a real example. I suspect there shouldn't be a whole lot of backtracking, so the memory cost might not be worth it if the gains are minuscule.

Nov 10 '25 15:11 aldehir

Aight, let me bounce my original idea - what if we just created a GBNF parser builder and used that to parse the messages? Then we have both problems (tool call / reasoning and compatibility with normal parsing) done in one go. Unless (haven't looked into it) it would just be too inefficient for normal content parsing?

Because right now it feels like we're adding another intermediate abstraction while GBNF is already implemented in GGML - so maybe just use a builder as an abstraction layer to create all the needed objects and add any missing partial parse support?

This is just an idea, not very fixated on it, just thought I'd share it. Regarding memory coatsnand the packrat parser, I think O(n) with typical LLM inputs is negligible, even with super long contexts we're looking at like a few MB overhead at most.

Nov 10 '25 17:11 pwilkin

Sounds like you're thinking of a parser generator. Something like yacc, bison, or ANTLR. The problem I see with those solutions is they require building a parse table upfront, which is less intuitive than building a parse tree such as in this PR. You could create a recursive descent parser but that would have to be done at compile time. If you did it at runtime, I think the solution would look a lot like this!

I haven't examined the GBNF code with a scalpel, but taking a brief look it seems like it uses a pushdown automata and may be challenging to extract content. Not that we would want to, since it is part of the core and not common. I believe there is a desire to keep the chat parsing isolated in common.

I also think you lose the expressiveness of being able to define the grammar in C++. For example, with this solution we could add a execute() parser to take in a user lambda and run when the parse subtree succeeds. You could define prune() that removes parts of the tree on a condition, such as if there no tools are provided. Not saying we want to do that, just to demonstrate the flexibility offered.

The solutions I mentioned above do this by defining their own language to insert code--not pretty in my experience.

That said, I am open to ideas. If you have a clearer picture of what that looks like, I'm happy to review. I understand inserting a new abstraction is a tough ask. I wanted to roll out a PoC to hopefully show value.

Nov 10 '25 18:11 aldehir

@aldehir Nah, you're probably right. I looked at the GBNF code and in fact it would take too much effort to extract the parsed content from there. We're better off just doing it your way. I'll try to code some of the missing pieces.

Nov 10 '25 20:11 pwilkin

@pwilkin great! If you have any questions, feel free to ask.

Nov 10 '25 22:11 aldehir

Aight, I'm done with the hybrid ops and convert_hf_to_gguf refactoring cleanup, so I'll probably finally look at this tomorrow :>

Nov 12 '25 00:11 pwilkin

No rush. I am getting closer to a set of parsing functions that I'm happy with. The unfortunate part is I had to roll specialized parsers to maintain comparable performance with the existing parsing. A lexer would likely help, but optimized parsers for certain use cases is enough for now.

I added a benchmark in the test that implements the Command R2B parser, and compares it to the existing one. It seemed like a good one to illustrate.

// Benchmarks are over 100 iterations
Reasoning + Content:
   New parser avg: 23 us
Legacy parser avg: 450 us

Reasoning + Tool Call:
   New parser avg: 263 us
Legacy parser avg: 151 us

The existing parsing has a leg up with JSON. That said, it's still a fraction of a millisecond for a full prompt. I think most of the cost will go into the constrained decoding anyway. I'll have to benchmark larger JSON documents. Worst case, we can fall back to the implementation in json-partial.cpp. The intent here is to better support streaming JSON.

Nov 12 '25 06:11 aldehir

Fixed the performance discrepancy, it had to do with how I was passing my values up the stack. I removed that and added an environment to maintain state that is manipulated with semantic actions.

Reasoning + Content:
   New parser avg: 27 us
Legacy parser avg: 452 us

Reasoning + Tool Call:
   New parser avg: 159 us
Legacy parser avg: 157 us

Nov 12 '25 09:11 aldehir

@aldehir I thought it wouldn't make much sense for me to interfere in the key parser parts, so I went for the tests instead.

I made a test case structure that allows grouping tests into compound test cases and adds a test harness for use with test cases that's supposed to be similar to those used by popular test cases in other languages. Currently it only supports assert_equal (since that was basically the only thing used), but it's easy to add other methods.

I think it's pretty critical that the tests for the parsers are also modularized as much as possible going forward since currently there's a lot of repetition on one hand and not too much coverage on the other hand. Ideally I think we should aim for standardized test-cases that all supported chat formats / templates would have to pass and only limit the tests based on whether the model supports reasoning and/or tool calls.

Nov 14 '25 15:11 pwilkin

Sounds good. I wrote a few tests here and there, but I have found it incredibly valuable implementing models to see what works well.

I did rename everything to use the common_chat_* prefix. Since common is ingested as a library, I suspect this is desired. Still stuck on whether it should be common_chat_combinator_parser or common_chat_peg_parser. The latter is shorter.

Nov 14 '25 15:11 aldehir

Sounds good. I wrote a few tests here and there, but I have found it incredibly valuable implementing models to see what works well.

I did rename everything to use the common_chat_* prefix. Since common is ingested as a library, I suspect this is desired. Still stuck on whether it should be common_chat_combinator_parser or common_chat_peg_parser. The latter is shorter.

I think peg_parser is both shorter and more informative, so it would probably be the better way to go.

Nov 14 '25 15:11 pwilkin

Renamed to common_chat_peg_parser.

GBNF grammar generation now supports generating lazy grammars by annotating rules with trigger(). The trigger patterns will still need to be handwritten.

Working on unicode support now.

Then I have to figure out how to hook it all up. To start, I'll just create a new one during both param init and parse. Although, it should really be reused for the duration of the generation.

Nov 15 '25 11:11 aldehir

@pwilkin how about this?

diff --git a/tests/CMakeLists.txt b/tests/CMakeLists.txt
index 4850e16f7..e060aa55f 100644
--- a/tests/CMakeLists.txt
+++ b/tests/CMakeLists.txt
@@ -1,13 +1,15 @@
 llama_add_compile_flags()
 
 function(llama_build source)
+    set(TEST_SOURCES ${source} ${ARGN})
+
     if (DEFINED LLAMA_TEST_NAME)
         set(TEST_TARGET ${LLAMA_TEST_NAME})
     else()
         get_filename_component(TEST_TARGET ${source} NAME_WE)
     endif()
 
-    add_executable(${TEST_TARGET} ${source})
+    add_executable(${TEST_TARGET} ${TEST_SOURCES})
     target_link_libraries(${TEST_TARGET} PRIVATE common)
     install(TARGETS ${TEST_TARGET} RUNTIME)
 endfunction()
@@ -83,6 +85,8 @@ function(llama_build_and_test source)
     set(multiValueArgs ARGS)
     cmake_parse_arguments(LLAMA_TEST "${options}" "${oneValueArgs}" "${multiValueArgs}" ${ARGN})
 
+    set(TEST_SOURCES ${source} ${LLAMA_TEST_UNPARSED_ARGUMENTS} get-model.cpp)
+
     if (NOT DEFINED LLAMA_TEST_LABEL)
         set(LLAMA_TEST_LABEL "main")
     endif()
@@ -95,7 +99,7 @@ function(llama_build_and_test source)
         get_filename_component(TEST_TARGET ${source} NAME_WE)
     endif()
 
-    add_executable(${TEST_TARGET} ${source} get-model.cpp)
+    add_executable(${TEST_TARGET} ${TEST_SOURCES})
     install(TARGETS ${TEST_TARGET} RUNTIME)
     target_link_libraries(${TEST_TARGET} PRIVATE common)
 
@@ -181,8 +185,8 @@ endif()
 
 llama_build_and_test(test-chat-parser.cpp)
 
-# Chat PEG parser tests (modular)
-file(GLOB_RECURSE CHAT_PEG_PARSER_TEST_SOURCES
+llama_build_and_test(
+    test-chat-peg-parser.cpp
     chat-peg-parser/simple_tokenizer.cpp
     chat-peg-parser/benchmark.cpp
     chat-peg-parser/test-actions.cpp
@@ -195,13 +199,7 @@ file(GLOB_RECURSE CHAT_PEG_PARSER_TEST_SOURCES
     chat-peg-parser/test-partial-parsing.cpp
     chat-peg-parser/test-recursive-references.cpp
     chat-peg-parser/tests.h
-    test-chat-peg-parser.cpp
 )
-add_executable(test-chat-peg-parser ${CHAT_PEG_PARSER_TEST_SOURCES})
-target_link_libraries(test-chat-peg-parser PRIVATE common)
-install(TARGETS test-chat-peg-parser RUNTIME)
-add_test(NAME test-chat-peg-parser COMMAND test-chat-peg-parser)
-set_property(TEST test-chat-peg-parser PROPERTY LABELS main)
 
 llama_build_and_test(test-chat-template.cpp)
 llama_build_and_test(test-json-partial.cpp)

Nov 15 '25 18:11 aldehir

Looks like the tests were failing, maybe because the fallback logic to add in get-model.cpp wasn't triggering? Either way, I think we can just use the unparsed arguments as additional source files. It keeps the changes minimal. The linker won't include anything from get-model.cpp if there are no references to it, so it doesn't hurt to keep it in.

Nov 15 '25 20:11 aldehir

@aldehir Aight, refactoring done, seems all's well:

==== TEST SUMMARY ====
tests      : 67
assertions : 267
failures   : 0
exceptions : 0
======================

Nov 15 '25 22:11 pwilkin

Perfect!

One nit: you don't need to include .h in the sources. CMake will compile with -MD to obtain header dependencies for source files and include them in the dependency chain. That said, other sections in the code include .h so we can leave it for consistency.

Nov 15 '25 23:11 aldehir

All right, I wrote a couple of helpers + refactored the Qwen3 example parser to use those helpers, let me know what you think - I believe those helpers will capture a lot of the typical implementations.

Nov 16 '25 00:11 pwilkin

Wow, I really like the new design. The old chat-parser was not very well documented, and its API sometimes produced unexpected results. Because of that, implementing a new tool-call parser always required reading the chat-parser source code carefully to confirm whether the behavior matched the expectations.

On top of that, the grammar and the parser have been two completely separate modules in the codebase, which made it quite easy for their behaviors to diverge. This new implementation seems to address all of these issues at once, which is a huge improvement!

Nov 16 '25 02:11 hksdpc255

@pwilkin

Seeing how subtly different these models are, I want to avoid prescribing any hard rules. Then it becomes a matter of tweaking the helpers for every little thing that could be different. You then have to reconcile the differences to ensure you don't break another model. It's a never ending game of whack-a-mole. We already see that with try_parse_reasoning(), trying to accommodate every model and it makes it a bit harder to use for the next model that's just different enough to cause problems. I rather see the grammar laid out entirely in a function so it is easy to visually inspect and verify.

I think we can get a ton of reuse from the SAX handlers though. If we prescribe a convention of rule names (e.g. content => content, reasoning => reasoning_content, etc.) Then we can create specialized handlers for each type of model. If it emits JSON tool calls, it gets the json_tool_calling handler. If it does this quasi-XML, it gets the quasi_xml_tool_calling handler. This is possible because the parsing normalizes the model output. Then your parse functions become simple 3-4 line functions--freeing time to come up with a well-crafted grammar!

If we do create helpers, they should be composable (e.g. p.xml_tag("tag_name", p.add_rule("tag-contents", ....)). Or, should invert control back to the caller via lambdas.

One thing we can benefit from is a helper that loops through all the tools and tool parameters, while resolving their references. It's a small function, but that process is used everywhere.

That said, I think patterns will emerge whether I like them or not! We'll see what works well, and what doesn't. We'll keep your helpers in place. For the example, I think we should produce both because they also serve as documentation.

Nov 16 '25 02:11 aldehir

@hksdpc255 thanks! I absolutely agree with the pain points. I'm hoping this addresses most of them while not creating a new class of problems. I'm hoping to hook it up to an actual model by tomorrow and see how it goes. Right now it's just a set of tests.

Nov 16 '25 03:11 aldehir

I'm hoping to hook it up to an actual model by tomorrow and see how it goes.

I can recommend several models that cover a wide range of tool-call patterns as reference:

Qwen3-Coder uses a widely adopted and fairly standard tool-calling format.
GLM-4.5-Air probably the simplest tool-call pattern I’ve seen.
Kimi-K2 appears to be less aligned, and may emit different tool-call formats in normal content (documented) versus reasoning content (undocumented). Its tool names also require an additional workaround, since the actual function name must be extracted from strings like "functions.[real_name]:123456".
Apriel-1.5 uses a completely different chat format across the board, including reasoning content, tool-call syntax, and even regular message formatting.

Nov 16 '25 03:11 hksdpc255

Kimi-K2 appears to be less aligned, and may emit different tool-call formats in normal content (documented) versus reasoning content (undocumented). Its tool names also require an additional workaround, since the actual function name must be extracted from strings like "functions.[real_name]:123456".

I noticed that, so I created a parser json_string_unquoted() that matches content in JSON strings. So I could do this:

"\"functions." + p.capture("tool-name", p.json_string_unquoted()) + ":" + p.capture("tool-call-id", p.json_string_unquoted()) + "\""

It could benefit from a shorter name.

Thanks for the list, I think Qwen3-Coder is the easiest I can run with my current hardware.

EDIT: json_string_content is probably a better name, but still a little long.

Nov 16 '25 03:11 aldehir

I'm thinking we should put the helpers in a separate file. The parser implementation is pretty big. It feels complete, though.

Nov 16 '25 08:11 aldehir

@aldehir Yeah, I split off the helper as a subclass of the main builder, will add any further helpers there, should avoid overfilling the main parser class.

I also reverted the old explicit Qwen3 parser builder and added the new helper alongside it. Restructured the test a bit to make it clearer. Now I'm going to try and add as many of the old parsers as possible to see how well it'll go and potentially get good patterns for the helpers.

Nov 16 '25 11:11 pwilkin

Aight, Minimax M2 and Seed-OSS are up. With the first one, I did a stupid mistake of doing different tool definitions from tool calls, so couldn't get a proper parse, so I added some debugging prints to go + a live example of how to use them :)

BTW, the current solution is if an incorrect function call is detected, it's still marked as a success since zero_or_more always succeeds, not sure if we don't want to pass a failure over somehow (as in, zero_or_more only trivially succeeds if the rest is empty?)

Nov 16 '25 13:11 pwilkin

Thanks!

I just found a case for keeping tests to 1 source file: it's a little hard to test in isolation :). If they were in a single source file, you can run ctest -V -R test-chat-peg-parser to run all tests, or -R test-chat-peg-parser-example-qwen3 to run one (or however many contain that prefix).

By incorrect, do you mean if the model generated an invalid tool call? I don't think that should happen in practice. With constrained decoding, we enforce the grammar so it should be parseable. If there are no tools, then we shouldn't constrain and make the reasoning/content parsing as permissive as possible. Also shouldn't build a parser that has a tool calling support and should just consume all content until the end.

You can add p.end() to the end to ensure that everything is consumed, but I found a bug when min repetitions == 0. I'll push out a fix here in a bit.

Nov 16 '25 17:11 aldehir

Ok, to better support writing custom helpers and simplify a few things, I'm going to:

Introduce ref() to reference a rule. The rule() function will be the actual rule definition. This replaces add_rule(). At the end we can resolve the rule references by traversing the parse tree. With that in place, helpers don't have to subclass builder. They can just use the builder to generate a subtree of rules. Users of helpers can attach that subtree to their own parser.
Remove trigger(), instead add it as an attribute to rules. I think it's ok to say only rules can be triggers.
Add an annotation property to rules. I noticed that we need to perform the same logic in the event handler for certain nodes, but they can't be named the same. We can use the annotation field instead.

Nov 16 '25 20:11 aldehir

Yeah, was thinking something similar, either add an extra property or make the rule name itself structured somehow (as in "category" and "name").

Nov 16 '25 20:11 pwilkin

Alright, I added some stuff. Besides doing a temporary workaround for the double-event problem (I renamed arg-string-content to arg-str-content to fix the double match):

I added an option for selective testing, you can now run all tests with test-chat-peg-parser, or you can enumerate the tests to run only the selected ones, --help lists the available tests
I refactored all the printouts to use LOG_ERR to help with interleaving conflicts for bufferring with the use of C and C++ printouts
I fixed the helpers to correctly capture argument and function names

Besides that, I fixed in the other tests the one thing that I already fixed in the Minimax-M2 test but forgot to mention: the logic for determining whether you should to a complete parse was wrong, because std::accumulate is, like most "substring" functions, exclusive, so you actually have to do it + 1 instead of it and likewise check it + 1 == tokens.end() instead of it == tokens.end().

Nov 16 '25 20:11 pwilkin