prql icon indicating copy to clipboard operation
prql copied to clipboard

Potentially using Chumsky

Open max-sixty opened this issue 2 years ago • 11 comments

There are great readability advantages to having a parser grammar, and we're still iterating between the language design and the parser, so we're not going to rewrite it at this stage.

But this project looks really impressive: https://github.com/zesterer/chumsky, from @zesterer (who seems to be actively supporting & developing it, unlike pest & nom)

We're already hitting the limits of pest — e.g. on https://github.com/pest-parser/pest/issues/271 has been frustrating, and my current work refactoring how functions are handled is not efficient — so If we find the existing error messages aren't good enough, it's worth considering.

max-sixty avatar Feb 26 '22 21:02 max-sixty

I'm interested to hear how you get on if you give it a go!

zesterer avatar Feb 27 '22 17:02 zesterer

Totally unaware of this issue, I actually did an incomplete proof-of-concept in chumsky last night. https://github.com/AlexRiedler/prql-chumsky (just the lexer and parser for now). I am hoping by the end of the month I will have the complete lexer and parser done.

AlexRiedler avatar Jul 11 '22 00:07 AlexRiedler

Awesome @AlexRiedler !

Copying what I wrote on Discord:

If someone wanted to do the work — maybe they feel like a big parser project — I would vote to make the change.

Though I'm not the only vote — CC @aljazerzen — so if you're doing this with an expectation it would merge, we should discuss first.

A couple of thoughts:

  • If there's any way of using it in part, to start — e.g. for the later part of the parsing, such as something like https://github.com/prql/prql/blob/6b374b176b02af92b454c2ff279eaad5500b9f6d/prql-compiler/src/parser.rs#L167-L200, that would allow for a more gradual on-ramp — we could make incremental progress rather than aiming for a single switch-over. But I'm not sure how viable that is.
  • We have lots of tests in parser.rs that aren't coupled to our current parsing which you could use (though they are coupled to the AST, but that would probably need to be invariant anyway)
  • We try to be abstract over Transforms in the early stage of parsing, rather than parse select any differently to derive

max-sixty avatar Jul 11 '22 01:07 max-sixty

Swamped more than I hoped the last month, ended up going on vacation so no real update on the state of my experiment.

we could make incremental progress rather than aiming for a single switch-over. But I'm not sure how viable that is.

I think this should be possible, let me try this out and see where I can get in a relatively short period of time.

We try to be abstract over Transforms in the early stage of parsing, rather than parse select any differently to derive

let me think about this a bit more of that makes sense, probably need to read more about the spec and the interactions between part parser vs lexer parts.

AlexRiedler avatar Aug 01 '22 23:08 AlexRiedler

+1 on this issue. A parser that supports graceful error recovery will be important if PRQL ever grows a type system and a wants to deliver IDE features like autocompletion. Chumsky seems like it could be a good fit for this.

petehunt avatar Sep 03 '22 00:09 petehunt

Feel free to let me know if you give it a try and run into issues (open a discussion thread or issue) :) I try to make the crate as easy to work with as I can, but it has a reputation for sometimes generating incomprehensible Rust errors!

zesterer avatar Sep 03 '22 09:09 zesterer

I remembered we discussed whether we could use Chumsky "gradually" on a dev call a couple of weeks ago — i.e. Pest at the front-end and Chumsky consuming Pest Pairs . I had thought that we could do this but others seemed skeptical!

The advantage of this would be we could have a few small PRs, maybe even multiple contributor, gradually growing Chumsky out to replace pest — rather than waiting for one big PR and having to resolve merge conflicts. Places like the reference in https://github.com/prql/prql/issues/131#issuecomment-1179848760 have enough logic there that it could be a good place to start.

I couldn't find an issue in Chumsky where this has been discussed. But there are lots of examples of parsing a tree of tokens from a lexer; for example https://github.com/zesterer/chumsky/blob/master/examples/pythonic.rs. So I think it would be possible, though I'm not confident and don't have a view on the tradeoff of additional effort.

(If @zesterer has any thoughts here, that would be great, but no expectations)

max-sixty avatar Dec 17 '22 22:12 max-sixty

For parsing token trees, you'll probably want Stream::from_nested. Chumsky can currently only consume flat strings of tokens, but from_nested provides a fairly convenient way to glue something that emits token trees to a chumsky parser with less than a dozen lines of glue code.

I'm happy to provide help with any other questions you have too.

zesterer avatar Dec 19 '22 12:12 zesterer

I've been playing with chumsky a little in the past week and I find that implementation of a parser heavily relies on the types in the input stream - you are basically just defining convertors from input stream token(s) to output token.

I feel that if we go the incremental route, we will have to do a major rewrite again to swap out pest's pairs with lexer tokens or plain chars.

So my advice would be to stick to two stage chumsky parser:

  • lexer (chars -> tokens)
  • parser (tokens -> Expr/Stmt/Typ)

aljazerzen avatar Dec 22 '22 09:12 aljazerzen

I'd tend to agree. There's a lot of old literature out there that talks about incremental parsing as if it's some sort of golden goose, but unless you're writing a language server (and even then!) you pretty much don't need it on modern hardware. If parsing is anywhere close to being your bottleneck in a language with non-trivial semantics, something's gone seriously wrong!

zesterer avatar Dec 22 '22 10:12 zesterer

I don't think we are talking about the same thing. With "incremental" I meant "incrementally changing out pest parser with chumsky" as opposed to "a single large commit that rewrites our parser in chumsky".

aljazerzen avatar Dec 22 '22 11:12 aljazerzen

This can be closed?

eitsupi avatar Jun 15 '23 04:06 eitsupi

Yes!

max-sixty avatar Jun 15 '23 06:06 max-sixty