chumsky icon indicating copy to clipboard operation
chumsky copied to clipboard

Reduce type inflation by decoupling combinator types

Open p-e-w opened this issue 4 years ago • 3 comments

Hi there! I'm writing a computer algebra system and I thought I'll give Chumsky a try. Your API design is excellent and in general I found Chumsky quite straightforward to use. Thank you for this crate!

That being said, I encountered a major problem: My simple expression parser takes almost five minutes to compile and results in a 1.3 GB debug executable. Rust-Analyzer is brought to its knees and barely capable of inspecting the code at all, but reveals that the final disjunction parser has a type that is dozens of pages when written out!

The main difference between that parser and your "nano_rust" example is that there are a few more levels of precedence, but that's about it. The language being parsed is by no means complicated or syntactically ambiguous. It appears that some mechanism in Chumsky results in exponential growth of parser complexity when nesting parsers hierarchically.

Any idea what's going on here? Is this a bug, or am I just using Chumsky incorrectly?

p-e-w avatar Nov 28 '21 08:11 p-e-w

Unfortunately, Rust deals with nested types poorly (perhaps by necessity, perhaps due to implementation problems: I'm not sure which). I believe that the work Rust does for a type is at least quadratic over the type's written length, so this can quickly inflate.

Thankfully, there is a solution!

All Chumsky parsers have a .boxed() method that performs type erasure, cutting down the size of your types significantly (at the cost of a small amount of runtime overhead in the form of a vtable lookup).

I tend to use .boxed() at various opportune points to avoid my parser types inflating too much. You can see this happening here.

If it's any consolation, Rust's Iterator API suffers from similar problems when chaining very large numbers of iterators together.

zesterer avatar Nov 29 '21 10:11 zesterer

Thank you for the suggestion, that makes perfect sense. I did notice the boxed method as I glanced through the docs, but I guess I skipped reading the full explanation which already covers my problem.

But while it's perhaps unsurprising that the compiler struggles with types that are Megabytes of text when written out, I still don't understand why this would result in a huge artifact being generated. ~~Rust generics are implemented via type erasure at compilation time, right? Seems like the complexity of the type's generic expansion should have no effect on binary size then.~~ Edit: Ah, forget it, I just realized Rust uses concretization, not erasure. That explains it then.

p-e-w avatar Nov 30 '21 04:11 p-e-w

I'm going to keep this issue open and change the title because I do think that there's still more work to do here (I can probably omit some types here and there to reduce type inflation).

zesterer avatar Nov 30 '21 16:11 zesterer

Closing in favour of #13.

zesterer avatar Feb 20 '23 22:02 zesterer