chumsky
chumsky copied to clipboard
replacing `then_with` in 0.10
congrats on the 0.10 release! I've worked on migrating part of PRQL's code to it
I've hit one issue in particular: we use then_with in PRQL to lex an "odd number of quotes". here's the code: https://github.com/PRQL/prql/blob/897b113b419f2c1c3beb3d53692657b96b11cc45/prqlc/prqlc-parser/src/lexer/mod.rs#L408-L437
what would be the idiomatic way of doing that in 0.10? I tried switching to then_with_ctx, but couldn't immediately figure out whether it was comparable and couldn't find any examples
fyi here's the current PR — if there's anything we're doing that's bad, would love to know: https://github.com/PRQL/prql/pull/5223/files#diff-8957681ce4841b0eea405779f7ceb121473f3d31476dc3113fdfde5b7d4bd451R687. (but no obligation to spend time on it ofc)
thank you!
Hello.
then_with was removed for performance/introspection reasons (it's effectively a 'black box' to chumsky, and in the future we're likely to start doing more and more up-front optimisation work on parser creation, as well as automatic static-analysis of parsers, so creating parsers anew during a parse isn't a scaleable long-term solution).
Its replacement comes in the form of the context-sensitive parsers, as you have guessed.
Here's a rough mock-up of a design I imagine will work. It deliberately only handles the odd-numbered case for the sake of simplicity: I think empty strings are probably best handled as another branch of the parser above this, perhaps via choice/or.
Hopefully the comments provide sufficient explanation!
let quote: char = ...;
// Parses an odd number of `quote`s, outputs the number of repeating pairs after the first quote
// i.e: 5 quotes results in an output of 2
let open = just(quote)
.ignore_then(just([quote; 2]).repeated().count());
// Also parses an odd number of `quote`s, but takes the number of repeating pairs to expect from the context passed to it (from the `open` parser)
let close = just(quote)
.ignore_then(just([quote; 2]).repeated().configure(|cfg, ctx| cfg.exactly(*ctx)));
// Any number of tokens, provided the token is not the start of the final closing quotes
// Outputs a `&str` slice of the parsed characters
let inner = any().and_is(close.not()).repeated().to_slice();
// A set of open quotes, the inner content, then a set of close quotes
// `open` provides its output (the number of repeating pairs) as context for `inner` and `close`.
open.ignore_with_ctx(inner.then_ignore(close))
At some point I'll get some time to write some comprehensive docs showing exactly how to go about using the context-sensitive parsers, but hopefully for now this gives you a flavour of how they might be used.
thank you very much @zesterer
I made some progress, but have been really struggling with one part. I don't want to use you as free support, but also I'm spinning here
I've minimized the issue to: when I uncomment the code labelled THIS SECTION, I get the compile errors below. this is the case even though no code uses that section of code. I think I likely don't understand rust type inference sufficiently — I'm guessing the compiler is inferring something about the types from uncommenting that code, even though it doesn't use the results of it at all.
re the error — is there an issue of how far the context is being carried? I think it's complaining that the context of usize vs () is inconsistent:
// Implementation of multi-level quoted strings using context-sensitive parsers
// Based on @zesterer's suggestion for handling odd number of quotes (1, 3, 5, etc.)
fn multi_quoted_string<'a>(
quote: &char,
escaping: bool,
allow_multiline: bool,
) -> impl Parser<'a, ParserInput<'a>, Vec<char>, ParserError<'a>> {
// Parse opening quotes - first a single quote, then count any pairs of quotes
// For example, """ would be 1 single quote + 1 pair = 2 total quotes
let open = just::<'a, _, ParserInput, ParserError>(*quote)
.ignore_then(just([*quote; 2]).repeated().count());
// Parse closing quotes - matches the exact same number of quote pairs as in opening
let close = just(*quote).ignore_then(
just([*quote; 2])
.repeated()
.configure(|cfg, ctx| cfg.exactly(*ctx)),
);
// Define what characters are allowed in the string based on configuration
let regular_char = if allow_multiline {
none_of(format!("{}\\", quote))
} else {
none_of(format!("{}\n\r\\", quote))
};
// Parser for string content between quotes, accounting for close parser
// Empty string case - even number of quotes produces empty string
let empty_string = just::<[char; 2], ParserInput, ParserError>([*quote; 2])
// .ignored()
.repeated()
.at_least(1)
.collect::<Vec<_>>()
.map::<Vec<char>, _>(|_| vec![]);
// // THIS SECTION
// let content_parser = if escaping {
// choice((escaped_character(), regular_char)).boxed()
// } else {
// regular_char.boxed()
// };
// (even without swapping these lines)
// let inner = content_parser.repeated().collect::<Vec<char>>();
let inner = regular_char.repeated().collect::<Vec<char>>();
// Either parse an empty string (even quotes) or a string with content (odd quotes)
choice((
empty_string,
// Parse opening quotes, content, closing quotes using context
// sensitivity
// inner,
open.ignore_with_ctx(inner.then_ignore(close)),
))
// Choose the appropriate content parser
}
error:
[Running: cargo insta test -p prqlc-parser --features chumsky-10 --check]
Compiling prqlc-parser v0.13.5 (/Users/maximilian/workspace/prql/.worktrees/chumsky-10/prqlc/prqlc-parser)
error[E0631]: type mismatch in closure arguments
--> prqlc/prqlc-parser/src/lexer/chumsky_0_10.rs:682:48
|
645 | .configure(|cfg, ctx| cfg.exactly(*ctx)),
| ---------- found signature defined here
...
682 | open.ignore_with_ctx(inner.then_ignore(close)),
| ----------- ^^^^^ expected due to this
| |
| required by a bound introduced by this call
|
= note: expected closure signature `for<'a> fn(RepeatedCfg, &'a ()) -> _`
found closure signature `fn(RepeatedCfg, &usize) -> _`
= note: required for `IterConfigure<Repeated<Just<[char; 2], &str, ...>, ..., ..., ...>, ..., ...>` to implement `chumsky::Parser<'_, &str, (), chumsky::extra::Full<chumsky::error::Simple<'_, char>, (), ()>>`
= note: 1 redundant requirement hidden
= note: required for `IgnoreThen<Just<char, &str, Full<Simple<'_, char>, (), ()>>, ..., ..., ...>` to implement `chumsky::Parser<'_, &str, (), chumsky::extra::Full<chumsky::error::Simple<'_, char>, (), ()>>`
note: required by a bound in `chumsky::Parser::then_ignore`
--> /Users/maximilian/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/chumsky-0.10.0/src/lib.rs:992:26
|
992 | fn then_ignore<U, B: Parser<'src, I, U, E>>(self, other: B) -> ThenIgnore<Self, B, U, E>
| ^^^^^^^^^^^^^^^^^^^^^ required by this bound in `Parser::then_ignore`
= note: the full name for the type has been written to '/Users/maximilian/workspace/prql/.worktrees/chumsky-10/target/debug/deps/prqlc_parser-d2556d7d84bcafce.long-type-9971570004005101746.txt'
= note: consider using `--verbose` to print the full type name to the console
help: consider wrapping the function in a closure
|
682 | open.ignore_with_ctx(inner.then_ignore(|arg0: RepeatedCfg, arg1: &()| close(arg0, /* &usize */))),
| ++++++++++++++++++++++++++++++ ++++++++++++++++++++
error[E0271]: type mismatch resolving `<Full<Simple<'_, char>, (), ()> as ParserExtra<'_, &str>>::Context == usize`
--> prqlc/prqlc-parser/src/lexer/chumsky_0_10.rs:643:9
|
642 | let close = just(*quote).ignore_then(
| ----------- required by a bound introduced by this call
643 | / just([*quote; 2])
644 | | .repeated()
645 | | .configure(|cfg, ctx| cfg.exactly(*ctx)),
| |____________________________________________________^ expected `usize`, found `()`
|
= note: required for `IterConfigure<Repeated<Just<[char; 2], &str, ...>, ..., ..., ...>, ..., ...>` to implement `chumsky::Parser<'_, &str, (), chumsky::extra::Full<chumsky::error::Simple<'_, char>, (), ()>>`
note: required by a bound in `chumsky::Parser::ignore_then`
--> /Users/maximilian/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/chumsky-0.10.0/src/lib.rs:946:26
|
946 | fn ignore_then<U, B: Parser<'src, I, U, E>>(self, other: B) -> IgnoreThen<Self, B, O, E>
| ^^^^^^^^^^^^^^^^^^^^^ required by this bound in `Parser::ignore_then`
= note: the full name for the type has been written to '/Users/maximilian/workspace/prql/.worktrees/chumsky-10/target/debug/deps/prqlc_parser-d2556d7d84bcafce.long-type-4586734992059696108.txt'
= note: consider using `--verbose` to print the full type name to the console
error[E0271]: type mismatch resolving `<Full<Simple<'_, char>, (), ()> as ParserExtra<'_, &str>>::Context == usize`
--> prqlc/prqlc-parser/src/lexer/chumsky_0_10.rs:645:14
|
645 | .configure(|cfg, ctx| cfg.exactly(*ctx)),
| ^^^^^^^^^ expected `usize`, found `()`
error[E0277]: the trait bound `ThenIgnore<Collect<chumsky::combinator::Repeated<chumsky::primitive::NoneOf<std::string::String, &str, chumsky::extra::Full<chumsky::error::Simple<'_, char>, (), ()>>, char, &str, chumsky::extra::Full<chumsky::error::Simple<'_, char>, (), ()>>, char, Vec<char>>, IgnoreThen<chumsky::primitive::Just<char, &str, chumsky::extra::Full<chumsky::error::Simple<'_, char>, (), ()>>, IterConfigure<chumsky::combinator::Repeated<chumsky::primitive::Just<[char; 2], &str, chumsky::extra::Full<chumsky::error::Simple<'_, char>, (), ()>>, [char; 2], &str, chumsky::extra::Full<chumsky::error::Simple<'_, char>, (), ()>>, {closure@prqlc/prqlc-parser/src/lexer/chumsky_0_10.rs:645:24: 645:34}, [char; 2]>, char, chumsky::extra::Full<chumsky::error::Simple<'_, char>, (), ()>>, (), chumsky::extra::Full<chumsky::error::Simple<'_, char>, (), ()>>: chumsky::Parser<'_, &str, _, chumsky::extra::Full<chumsky::error::Simple<'_, char>, (), usize>>` is not satisfied
--> prqlc/prqlc-parser/src/lexer/chumsky_0_10.rs:682:30
|
682 | open.ignore_with_ctx(inner.then_ignore(close)),
| --------------- ^^^^^^^^^^^^^^^^^^^^^^^^ unsatisfied trait bound
| |
| required by a bound introduced by this call
|
= help: the trait `chumsky::Parser<'_, &str, _, chumsky::extra::Full<chumsky::error::Simple<'_, char>, (), usize>>` is not implemented for `ThenIgnore<Collect<Repeated<..., ..., ..., ...>, ..., ...>, ..., ..., ...>`
I pushed the latest code to https://github.com/PRQL/prql/pull/5223/commits/5018a5a3acaee1a2fa84939356e1ecfca98040c3 if that's helpful at all (https://github.com/PRQL/prql/pull/5223/ is the PR, though hopefully the code will change at that link...)
I think the problem is your use of the explicit parameters here:
let open = just::<'a, _, ParserInput, ParserError>(
Note how in the definition of just, the E parameter must implement ParserExtra - it's no longer used to carry just the error type, but also a whole host of 'extra' types that matter to the parser - the error type, the parser state, and also the parser context. By forcing it to be ParserError, you've explicitly forced it to have type () (because, I assume, ParserError is an alias for extra::Err<MyError>). Effectively, you've forced chumsky to use the default context type, which is ().
In general, we try to recommend against explicitly annotating type parameters in this way for reasons like this. There are very rarely cases in which it's necessary, but they're the exception rather than the norm.
that makes sense not to include explicit params. I think I had included them to try to understand what they should be.
but even after removing them, it doesn't help: https://github.com/PRQL/prql/pull/5223/files#diff-8957681ce4841b0eea405779f7ceb121473f3d31476dc3113fdfde5b7d4bd451R631-R686
the only remaining explicit params are on the output, in:
.collect::<Vec<_>>()
.map::<Vec<char>, _>(|_| vec![]);
any ideas? I realize it's very unleveraged to debug individual libraries consuming chumsky. to the extent this is a specific case of a general issue, feel free to point me towards the general issue... TY!
I assume no change in the error message? The next step is to start removing things until it works. close, for example, is unnecessary for the parser to compile. The .configure(...) calls can also be temporarily commented out. It's a bit difficult for me to give specific advice without seeing what you're seeing in front of you. When errors like this appear and the solution is non-obvious, I start commenting bits out until I can binary-search my way to a solution.
I see that you've got explicit type parameters in two places: I assume that you tried removing both?