chumsky icon indicating copy to clipboard operation
chumsky copied to clipboard

Documentation request: State and Context

Open stefnotch opened this issue 9 months ago • 10 comments

Chumsky has a concept of a state and context. And quite a few combinators that do things with them. And a map_ctx standalone function.

I personally never quite managed to wrap my head around that featureset. What is it intended to be used for, and how does one work with it? e.g. How do I take an existing context, and change it into a new type of context?

stefnotch avatar Sep 26 '23 21:09 stefnotch

'State' is just a mutable value that gets carried around during parsing. It can contain arbitrary data and can be whatever you like: common choices include string interners and arena allocators (or both! Just make a struct containing both of them). There are good examples of both in the doc examples of fold_with_state and map_with_state.

Context is different: it's some information passed from a parser earlier in the input to one later in the input that changes how the latter behaves. For example, an indentation-sensitive parser might pass the indentation level from the first indentation parser to all of the trailing indentation in a block, allowing it to properly parse indentation (despite this usually being a context-sensitive thing that recursive descent parsers struggle with). Here's an example of this.

Hopefully I'll get the time to write up proper long-form guides for both of these soon.

zesterer avatar Sep 26 '23 22:09 zesterer

Btw, I think there's a typo in the documentation for ignore_with_ctx. It should probably say "if you do" https://github.com/zesterer/chumsky/blob/af4e2b12cf1447e0e321af80040975254a4aa6a4/src/lib.rs#L1098

stefnotch avatar Sep 27 '23 10:09 stefnotch

Yep, you're right. I think this was due to the combinator roles being switched at one point.

zesterer avatar Sep 27 '23 11:09 zesterer

For when you're going to write up a guide, here are a few questions that I personally have:

  • How do I take an existing context, and change it into a new type of context? I did eventually stumble upon .map_with_ctx(|_, ctx| ctx.clone()).ignore_with_ctx(map_ctx(|ctx| do something with the ctx)), but that seems pretty convoluted.

  • How do I start a parser with a given context? parser.with_ctx(...).parse(input) doesn't immediately work.

use chumsky::{extra, prelude::EmptyErr, text, IterParser, Parser};

#[derive(Clone, Debug, Default)]
struct TestContext {
    value: char,
}

#[test]
fn test_chumsky_recursive_context() {
    let number = text::digits::<char, &str, extra::Full<EmptyErr, (), TestContext>>(10)
        .exactly(1)
        .collect::<Vec<_>>()
        .map_with_ctx(|result, ctx| {
            println!("result: {:?}, ctx: {:?}", result, ctx);
            if ctx.value == result[0] {
                Some(result[0])
            } else {
                None
            }
        })
        .boxed();

    assert_eq!(
        number
            .with_ctx(TestContext { value: '1' })
            .parse("1")
            .into_output(),
        Some(Some('1'))
    );
}
  • How do I only temporarily change the context? Like how do I express "then run indent parser with current context plus 2"?

stefnotch avatar Sep 27 '23 15:09 stefnotch

How do I take an existing context, and change it into a new type of context?

There is a map_ctx function. Unlike most functions, it's in postfix position (i.e: to indicate that it's changing the context that's getting passed into the parser, not mapping some output).

How do I start a parser with a given context?

with_ctx should work. What problem are you running up against?

How do I only temporarily change the context? Like how do I express "then run indent parser with current context plus 2"?

Context is local to a section of the parser tree. When using map_ctx, say, the mapped context will only apply to the parser given by the second argument. The new context will only be observable by that parser.

zesterer avatar Sep 27 '23 15:09 zesterer

If you imagine a parser as a tree with ordered children, context is 'created' by a node, and can only be used by nodes that share the same parent and come after the creator, and children of those nodes. It flows 'forwards and down', which is the opposite of most parse results, which flow 'backwards and up' from where they are generated until they are returned as output at the top of the tree.

CraftSpider avatar Sep 27 '23 16:09 CraftSpider

How do I take an existing context, and change it into a new type of context?

There is a map_ctx function. Unlike most functions, it's in postfix position (i.e: to indicate that it's changing the context that's getting passed into the parser, not mapping some output).

How do I start a parser with a given context?

with_ctx should work. What problem are you running up against?

I see. Apparently I'm running into the same type inference issue in both cases. The piece of code above should demonstrate the problem. map_ctx and with_ctx allow for an arbitrary, unrelated output type. This means that in most cases, I run into the Rust compiler saying that it doesn't know which type to use.

For example:

error[E0284]: type annotations needed
  --> parser\tests\chumsky_tests.rs:99:27
   |
99 |         let base_number = map_ctx(|ctx: &TestContext| ctx.clone(), number).boxed();
   |                           ^^^^^^^ cannot infer type of the type parameter `E` declared on the function `map_ctx`
   |
   = note: cannot satisfy `<_ as ParserExtra<'_, &str>>::Context == TestContext`
help: consider specifying the generic arguments
   |
99 |         let base_number = map_ctx::<Boxed<'_, '_, &str, char, chumsky::extra::Full<EmptyErr, (), TestContext>>, char, &str, E, chumsky::extra::Full<EmptyErr, (), TestContext>, _>(|ctx: &TestContext| ctx.clone(), number).boxed();
   |                                  +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Without an explanation, my default assumption was "I'm doing something wrong". But I'm glad to see that map_ctx::<_, _, _, extra::Full<EmptyErr, (), _>, _, _> does indeed work, even if it's pretty verbose.

The with_ctx example would be image

How do I only temporarily change the context? Like how do I express "then run indent parser with current context plus 2"?

Context is local to a section of the parser tree. When using map_ctx, say, the mapped context will only apply to the parser given by the second argument. The new context will only be observable by that parser.

Ah, I see. Thank you!

stefnotch avatar Sep 27 '23 17:09 stefnotch

Oh hmm, looks like we need to thread E through as a phantom type parameter. Let me do that now.

zesterer avatar Sep 27 '23 19:09 zesterer

If you hitch yourself up to the latest commit, it should work fine without the explicit type parameter now.

zesterer avatar Sep 27 '23 19:09 zesterer

Woah, map_ctx is really nice to use now! Thank you very much.

stefnotch avatar Sep 27 '23 19:09 stefnotch