chumsky icon indicating copy to clipboard operation
chumsky copied to clipboard

Feature request: Allow users to implement custom recovery strategies

Open jfecher opened this issue 2 years ago • 6 comments

As it stands now, users only have access to skip_until, skip_then_retry_until, and nested_delimiters. I'm unsure if the best way to go about this would be exposing the Strategy trait (which currently uses private types) - preferably in a cleaner form, or adding lower-level recovery strategies/parsers that can be used to build up more complex strategies. As it stands, I believe it isn't possible to implement a strategy like expect from https://eyalkalderon.com/blog/nom-error-recovery/. skip_until is close but provides no way to not skip any tokens.

jfecher avatar Jan 24 '22 17:01 jfecher

I've been thinking a bit on the correspondence between recovery strategies and parsers themselves. For starters, recovery strategies appear to be normal parsers that are called on otherwise failing input. In that respect it'd be a shame if we couldn't reuse normal parsers with them and build up bigger recovery strategies the same way bigger parsers are built up. Using this, I think the simplest error recovery strategy may be a simple a.or(b)-like strategy in which if a fails, the error is still logged but the parser b is tried in an attempt to recover from any errors.

Using this function we can build up more complicated error recovery strategies like the expect mentioned above. Since an or parser already exists, I'll use recover_with instead, but taking in any Parser, rather than any Strategy. If that second parser successfully parses, the parse is recovered:

let expect = |p, default| p.recover_with(empty(), default);

just(Let)
    .then(expect(ident(), Ident::error))
    .then(expect(just(Equals), ...)) //etc.

// skip_until is easily definable
let skip_until = |tokens, f| none_of(tokens).repeated().map_with_span(|_, span| f(span));

jfecher avatar Jan 25 '22 16:01 jfecher

Apologies about not replying to this earlier, GitHub decided to omit a swathe of things from my notifications for some reason.

I have been having similar thoughts about this too. I'm very tempted to take another swing at error recovery, with a more parser-driven approach. If I get time over the next week, I'll do some experimenting and see what I can come up with!

Alternatively, if you have any specific thoughts (I like the expect notation) I'd love to hear them!

zesterer avatar Feb 07 '22 15:02 zesterer

I don't have many specific thoughts on recovery that haven't been mentioned yet. Regarding discussion of a new release in #76, it would be out of scope to go through with the changes above of course, but it would be nice if an empty recovery strategy (like expect above or an error-logging version of .or_not()) was included. The only way I found of emulating one currently is very obtuse and involves parsing the input twice:

// Forces a parser to succeed, logging any errors it had
pub fn force<'a, P, T: 'a>(parser: P) -> impl Parser<Option<T>> + 'a
    where P: Parser<T> + Clone + 'a,
{
    parser
        .clone()
        .map(Ok)
        .or_else(|err| Ok(Err(err)))
        .rewind()
        .then_with(move |result| match result {
            Ok(_) => parser.clone().map(Ok).boxed(),
            Err(err) => empty().map(move |_| Err(err.clone())).boxed(),
        })
        .validate(move |result, _, emit| match result {
            Ok(t) => Some(t),
            Err(err) => {
                emit(err);
                None
            }
        })
}

I'm sure this could hopefully be implemented via Custom or manually implementing Parser, but I couldn't figure it out. Anecdotally I'm using this function in quite a few places, mostly as a last resort for other strategies. I find it's often helpful for rules you don't want to ever fail - e.g. committing to a certain parse rule. I often use a pattern like:

// a.then_commit(b, f) = a.then( force(b.recover_with(skip_then_retry_until(...)).map(|opt| opt.unwrap_or_else(f)) ); 

just(Token::Let)
    .ignore_then_commit(ident(), || Ident::Error)
    .then_commit_ignore(just(Token::Equals))
    .then_commit(expression(), || Expression::Error)

Where then_commit and friends parse their lhs and if that succeeds "commit" to parsing their rhs by recovering where necessary. So the only failable part of the above would be just(Token::Let). Anyways, not sure if any of this is helpful, this is just how I've found myself using the library so far. Thank you for your work on it!

jfecher avatar Feb 07 '22 18:02 jfecher

For now, would some sort of .recover_with(fallback(impl Parser, || Error)) work for you?

zesterer avatar Feb 07 '22 20:02 zesterer

Definitely, I was even suggesting something simpler in the meantime like .recover_with(optional(|span| default_value)) would work. Whichever solution you think would be a better fit though will work for me

jfecher avatar Feb 07 '22 20:02 jfecher

Related PR: #107 which implemets the recover_with parser in this issue as recover_via

jfecher avatar Mar 16 '22 17:03 jfecher

Any chance to get recover_via PR (or alternative solution) merged soon? I didn't even suspect that subtle problem of more sophisticated recovery strategies will hit me too :)

mbuczko avatar Oct 18 '22 15:10 mbuczko

I'm happy to accept a pull request (it shouldn't be too hard to implement), but I don't personally have time to implement it for at least a week.

That aside, this is the approach I want to go for when implementing recovering strategies for zero-copy, so this feature will be coming anyway... eventually. I'm happy to accept a PR for the current master branch though.

zesterer avatar Oct 19 '22 08:10 zesterer

#233 effectively addresses this issue, although I'm going to keep it open for the sake of discussing the design for zero-copy, which I'd like to be a bit more capable/terse.

zesterer avatar Dec 19 '22 12:12 zesterer

Implementing recovery strategies in terms of parsers is now the only way of performing error recovery in zero-copy, so this issue is solved in both master and zero-copy.

zesterer avatar Feb 20 '23 22:02 zesterer