chumsky icon indicating copy to clipboard operation
chumsky copied to clipboard

Clarify the distinction between State and Context in documentation

Open Wybxc opened this issue 5 months ago • 3 comments

The current documentation states that the difference between State and Context is that Context can influence the parsing progress of the Parser, while State cannot. However, this is not the case. The Parser::try_map_with method allows reading the current State and Context, and based on this, it can decide whether to fail the parsing at this point. This indicates that State, just like Context, has the ability to alter the parsing progress.

In my understanding, the real distinction between State and Context lies in their respective scopes. Simply put, State is "left-to-right," while Context is "top-down." If we organize all parsers into a tree, a parser can read State from previously parsed content, even if that content does not come directly from the parser's parent node (e.g., it could come from a left sibling of an ancestor node in the parsing tree, as long as its parsing occurred before this node). On the other hand, a parser can only read Context from one of its ancestor nodes (via Parser::then_with_ctx or similar mechanisms).

I am currently implementing a C language parser, where left context is needed to distinguish between variable names and type names. In this use case, I found that using State rather than Context to store the context allows for a more natural implementation.

If my understanding is correct, please consider revising the documentation's description of State and Context.

Wybxc avatar Aug 09 '25 03:08 Wybxc

A somewhat related question: I have parsers that parse a tree like structure. Then I have some output "visitors" (so traits that have a new() -> Self, some add_child(&mut self, child), and then a add_child(&mut self, child) on the parent. I can without any problem create the root level state before parsing, but I can't figure out how to construct a child based on the just parsed things.

I'm thinking of having a way to go from state being T to state being U where U is constructed from the current output. Then in the next .then(...) combinator I can use &mut U to input data into U, and later I need to have access to U and &mut T for combining the children (also this should be able to produce an error (that doesn't stop parsing) but that's another story). Is this possible with State or is it something I need Context for? Because in my case, the data I'm collecting will never be used to change the parsing behaviour, and the documentation says:

This can affect the output of a parser, but for things that don’t wish to alter the actual rules of parsing, one should instead prefer Self::State.

Ideally there would be some combinators:

trait Parser</* I'm leaving out all other types*/ /* Output: */ O, /* State: */ T> {
    fn then_with_state_from_self( // no idea how to name it
        self,
        parser: impl Parser</* State: */ U>,
        f: impl Fn(O) -> U,
    ) -> impl Parser</* Output: */ U, /* State: */ T>; // this allows later using map_with to combine it, but that could also be done by another function passed in:

    // alternative maybe better for my tree approach:
    fn tree_like( // again I have no idea how to name it
        self,
        create: impl Fn(O) -> U,
        children: impl Parser</* Output: */ (), /* State: */ U>,
        combine: impl Fn(&mut T, U), // (this can potentially fail, but failing shouldn't stop parsing, so emitter should also be given)
    ) -> impl Parser</* Output: */ (), /* State: */ T>;
}

zeichenreihe avatar Aug 10 '25 15:08 zeichenreihe

This indicates that State, just like Context, has the ability to alter the parsing progress.

If this is correct, this would mean I should in my use case just switch to using Context, and my question is resolved. (This also fits neatly with the mentioned "left-to-right" and "top-down".)

zeichenreihe avatar Aug 10 '25 15:08 zeichenreihe

I have added a section to the guide about parser state, if it helps. I intend to do the same about parse context... at some point.

In my understanding, the real distinction between State and Context lies in their respective scopes. Simply put, State is "left-to-right," while Context is "top-down.

Hmm, somewhat. A better way to think of it might be "state assists producing outputs, context assists parsing inputs". State is used to implement things like intern tables, arena allocators, etc. Context is passed through the parser during parsing to change the behaviour of later parsers based on what was seen before (i.e: to facilitate context-sensitive parsing).

zesterer avatar Aug 11 '25 18:08 zesterer