chumsky icon indicating copy to clipboard operation
chumsky copied to clipboard

Add support for zero-copy parsing

Open zesterer opened this issue 2 years ago • 4 comments

Currently, only some initial experiments outside of the main codebase.

The new design departs quite a lot from master, to the point that it's likely to end up being a near-complete rewrite of the crate. However, benchmarks so far are promising!

image

(this is not a particularly fair comparison: serde_json always allocates Strings whereas chumsky's zero-copy parser just refers back to the original JSON's bytes with &[u8], and serde_json's object map is slightly more expensive, but regardless it's a nice demonstration)

Along with support for zero-copy parsing, this new design also permits the following:

  • A state type parameter, allowing mutable state to be passed down through the parser (useful for interning and more)
  • A check-only mode that skips generating output
  • Regex parsers

Unfortunately, the code requires GATs, a feature that is currently unstable (but might not be for much longer!).

  • [ ] Reimplement all primitives
  • [ ] Reimplement all combinators
  • [ ] Reimplement error prioritisation
  • [ ] Reimplement recovery
  • [ ] Reimplement all text parsers

Closes #9

zesterer avatar Feb 08 '22 03:02 zesterer

I want to experience this function but the following error is reported, how should I solve it?

error[E0309]: the parameter type `S` may not live long enough
   --> src\input.rs:233:64
    |
233 | ... E: Error<I::Token>, S, P: Parser<'a, I, E, S> + ?Sized>(parser: &P, inp: &mut InputRef<'a, '_, I, S>) -> PResult<Self, P::Output, E> {
    |                         -     ^^^^^^^^^^^^^^^^^^^ ...so that the type `S` will meet its required lifetime bounds...
    |                         |
    |                         help: consider adding an explicit lifetime bound...: `S: 'a`
    |
note: ...that is required by this bound
   --> src\input.rs:238:69
    |
238 | pub trait Parser<'a, I: Input + ?Sized, E: Error<I::Token> = (), S: 'a = ()> {
    |                                                                     ^^
error: missing required bound on `Iter`
   --> src\input.rs:402:5
    |
402 |     type Iter<'a>: Iterator<Item = T>;
    |     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^-
    |                                      |
    |                                      help: add the required where clause: `where Self: 'a`
    |
    = note: this bound is currently required to ensure that impls have maximum flexibility
    = note: we are soliciting feedback, see issue #87479 <https://github.com/rust-lang/rust/issues/87479> for more information

oovm avatar Feb 22 '22 05:02 oovm

@oovm Hey, I've just pushed up some fixes. It seems like I was working against an older nightly when developing this.

You can use cargo bench --features nightly to see the benchmark in action.

Obviously, please be aware that this is very early work, is missing a lot of combinators, and is likely going to be changing a lot before being merged. It's definitely not much more than an experiment right now, and I've only implemented the features required for the json benchmark to work.

zesterer avatar Feb 22 '22 09:02 zesterer

When zero-copy is merged, is it intended to fully replace the non-zero-copy code? If so, do you think it's a good time to remove the old code in this branch, lower the modules to root, and move the documentation over to the new modules?

CraftSpider avatar Jun 14 '22 17:06 CraftSpider

When zero-copy is merged, is it intended to fully replace the non-zero-copy code? If so, do you think it's a good time to remove the old code in this branch, lower the modules to root, and move the documentation over to the new modules?

Yes, I think this would be reasonable. I'm still not certain about the overall design though (the Input trait still feels like it has odd corner cases, we don't have error recovery systems (interesting discussion in #159 about some potential alternative strategies), and I think revisiting some of the API features like from_nested to make them more of a happy path is worth doing). I definitely don't see much of - if any - the old code surviving though.

Edit: One thing that will need moving over are parsers in text. I think it would be nice to implement them in terms of custom though, if only for better compilation times and efficiency.

zesterer avatar Jun 14 '22 21:06 zesterer