chumsky icon indicating copy to clipboard operation
chumsky copied to clipboard

Specialize Context

Open DasLixou opened this issue 1 month ago • 13 comments

It would be helpful to change the context based on already parsed data.

e.g.

ascii::ident()
    .spez_ctx(|ident, _old| matches!(ident, "special"))
    .then(just("!").contextual().configure(|_, ctx| ctx))

to only parse the ! when special was written.

DasLixou avatar Dec 07 '25 00:12 DasLixou

You can achieve this like so (pseudocode incoming, you may need to poke it to make it compile)

ascii::ident()
    .then_with_ctx(just('!')
        .contextual()
        .configure(|_, ident| matches!(ident, "special")))

Is this sufficient, or was this just a reductive example?

zesterer avatar Dec 07 '25 13:12 zesterer

It was reductive. The "Problem" is that I have some more parsing in between the special and the !, which is in fact also an ident. So what I want to do is make an html parser. So I'd parse <[ident] [attr=expr]*> and then want, depending on the ident, parse the Body and closing tag as long as the ident isnt one of the self closing ones. But I'll try if I can, even if pretty ugly nested, do it your way.

DasLixou avatar Dec 07 '25 13:12 DasLixou

Okay I'm reaching my limits with this minimal reprod:

fn parser<'s>() -> impl Parser<'s, &'s str, ()> {
    recursive(|node| {
        text::ascii::ident()
            .then_with_ctx(node)
            .map(|_| ())
    })
}

DasLixou avatar Dec 07 '25 13:12 DasLixou

Okay, this get's even weirder.

fn parser<'s>() -> impl Parser<'s, &'s str, ()> {
    recursive(|node| {
        text::ascii::ident()
            .then_with_ctx(node.with_ctx(())
                .contextual().configure(|_, name| matches!(*name, "tada")
            )
            .map(|v| println!("{v:?}"))
    })
}

solves the problem, but I think what happens now is that the contextual is treated like an "error" - what I mean by that is that the type of v contains the data of the contextual as if it would always be parsed, and if it's disabled then no map get's triggered.

Is there a way to A) make this cleaner B) have the contextual make it's output an Option<O> instead of being an half-error?

DasLixou avatar Dec 07 '25 14:12 DasLixou

I'm even more confused. Why does

end().map(|_| panic!("huh"))

never panic?!

DasLixou avatar Dec 07 '25 14:12 DasLixou

map is assumed from chumsky to be a pure function, and is optimized away way when possible. A common case for this is when the output is ignored.

Although a bit hidden, that can be seen here, in the guide:

https://docs.rs/chumsky/latest/chumsky/guide/_06_technical_notes/index.html

Zij-IT avatar Dec 07 '25 15:12 Zij-IT

Oh wow, thanks! In my example it seems like the () output just made it skip the panic completely, as the result was Some(()). Still, is there currently a way to do contextual parsing and using the output of that in both cases? I've also tried implementing my own parser but all the Emit etc. types are private :/

DasLixou avatar Dec 07 '25 15:12 DasLixou

Thinking about it a bit more, ideally this would look something like this:

top
.then(branch! {
    // branch based on previous output
    (_, "special") => just(">").to(None)
    _ => more.map(Some)
}
.validate(...)

DasLixou avatar Dec 07 '25 15:12 DasLixou

Thinking about it a bit more, ideally this would look something like this

Perhaps the following would do?

// One of the choice parsers will be chosen
top.then_with_ctx(choice((
    just(">").to(None).contextual().configure(|_, top| top == "special"),
    more.map(Some).contextual().configure(|_, top| top != "special"),
)))

zesterer avatar Dec 07 '25 15:12 zesterer

But then the value doesn't get propagated, right? Like I can't work on the Option afterwards. Oh wait - does it work as the first one is configured as "error" so then it uses the value of the second one? then I don't even need the != contextual for the second, correct?

DasLixou avatar Dec 07 '25 15:12 DasLixou

@zesterer Your code works, but only if both get a configure, because otherwise on an error in the contextual one, the other one would just be used, okay. But how does it now know what to recommend "missing" on an error?

DasLixou avatar Dec 07 '25 16:12 DasLixou

But how does it now know what to recommend "missing" on an error?

That information gets built up by ariadne as the parse tree is explored, so it should work properly - I think?

zesterer avatar Dec 07 '25 19:12 zesterer

No what I meant is, when the Choice sees "A errored because contextually disabled" and "B errored because it didnt begin with <" How does it know to just report the latter? And for "A errored because wrong symbol" and "B errored because contextually disabled" how does it now know to just report the former?

DasLixou avatar Dec 07 '25 19:12 DasLixou

Chumsky has some built-in error prioritisation heuristics that allow it to report errors in a vaguely sensible manner, as you describe.

For the most-part, the current heuristic is just 'the parser branch that made the most progress is the one that gets to have its error reported', but some error types (like Rich) also support merging error information together so that multiple parser branches can contribute information to the final error.

The exact details of this process are an implementation detail because the logic is too subtle to be worth documenting in full. That said, please do report any errors you see that you think could be improved.

zesterer avatar Dec 08 '25 16:12 zesterer