chumsky icon indicating copy to clipboard operation
chumsky copied to clipboard

Investigate type inflation and strategies to reduce it

Open zesterer opened this issue 4 years ago • 16 comments

zesterer avatar Nov 01 '21 09:11 zesterer

Interesting Twitter thread about this: compilation times might improve with better memoisation of trait obligations in rustc.

zesterer avatar Mar 14 '22 22:03 zesterer

That Twitter thread conflates two different perf problems. As https://github.com/fasterthanlime/netherquote identifies, the chumsky slowness looks very much to be related to large amounts of LLVM IR being generated, rather than the rustc front-end being slow.

nnethercote avatar Mar 15 '22 05:03 nnethercote

This is interesting to me, it seems that this behaviour is different to previous cases I've seen where the bulk of compilation time has been spent during the compilation of a lib crate rather than the final binary, which implied to me that trait resolution can also be a bottleneck. Edit: in hindsight, this might just have been because the parser code ended up getting monomorphised in the lib crate rather than the final binary anyway, so perhaps this is not relevant.

I've gone through the recommendations that you mentioned in https://github.com/fasterthanlime/netherquote/issues/1, although unfortunately I don't believe that the trick of moving code out of generic functions is applicable, given that combinators are generic over the parsers that compose them. In the one case in which this was applicable, I managed to cut the LLVM IR generated by the nano_rust example down from 272,701 to 254,948, which is definitely a welcome saving, but not quite of the order that I was hoping for unfortunately.

zesterer avatar Mar 15 '22 11:03 zesterer

Disappointingly, invoking parsers through dynamic dispatch also didn't seem to cut down on the generated IR. This is quite a surprising result though, so perhaps I missed something that's causing the compiler to monomorphize without realising it.

Edit: My theory is that Rust is eagerly generating a vtable entry for every implementer of Parser that ever gets used, even if it's not coerced to a trait object at any point. Does anybody know whether this is true?

zesterer avatar Mar 15 '22 12:03 zesterer

so perhaps I missed something that's causing the compiler to monomorphize without realising it.

I've been meaning to try -Zshare-generics on this https://github.com/bevyengine/bevy/blob/9dfd4e4b08aba146e4d14e457b7b2ac6560e8ea3/.cargo/config_fast_builds#L8

I have no idea if it's even relevant here, but it popped in my head as a "related thing" and who am I to ignore my own head.

fasterthanlime avatar Mar 15 '22 12:03 fasterthanlime

This simple example seems to confirm my theory: Rust generates vtables for types that implement a trait regardless of whether instances of those types are actually ever coerced to a trait object in practice (a vtable entry for both u8 and u16 end up in the final LLVM IR, despite the fact that only u8 is ever coerced to a trait object).

It seems that this is the reason for the sheer volume of LLVM IR being generated: Rust seems to instantiate a new vtable for every single monomorphised implementer of Parser, even if it's unnecessary to do so.

I think it might be possible for me to trick Rust into not doing this by being sneaky with a secondary helper trait for the bits that strictly require dynamic dispatch, but it feels like the compiler itself should be perfectly capable of doing the sensible thing here, but doesn't.

I've been meaning to try -Zshare-generics on this

I believe that this will only have an impact across many compilation units which might not be relevant in this context, but perhaps I'm wrong about about exactly what it does.

zesterer avatar Mar 15 '22 13:03 zesterer

@nnethercote Do you think it's worth me opening an issue about Rust generating unnecessary vtables?

Edit: I thought initially that this might be because Rust can't know which other compilation units coerce a type to a trait object, but in hindsight this shouldn't be an issue because each compilation unit should be generating its own vtable entry anyway, so I feel like this optimisation should still be valid.

zesterer avatar Mar 15 '22 13:03 zesterer

@fasterthanlime Some success you might be interested in. I swapped out the Parser trait for a substitute when doing dynamic dispatch in chumsky. This seems to have worked to some extent, tricking the compiler into generating fewer vtables. The number of LLVM lines generated for netherquote went from 322815 to 265804 when patching in my local version of the crate. I'm going to see whether I can do better than this though.

zesterer avatar Mar 15 '22 13:03 zesterer

@nnethercote Do you think it's worth me opening an issue about Rust generating unnecessary vtables?

Yes! Even if it's a known issue, having another example demonstrating it would be useful.

nnethercote avatar Mar 15 '22 20:03 nnethercote

I'm also now suspecting that Rust it doing this not just for vtables, but for all traits for which a type implementing them is even mentioned... I don't see how or why the compiler would be generating that much LLVM IR if that were not the case.

zesterer avatar Mar 15 '22 20:03 zesterer

I'm also now suspecting that Rust it doing this not just for vtables, but for all traits for which a type implementing them is even mentioned... I don't see how or why the compiler would be generating that much LLVM IR if that were not the case.

@zesterer I think this is related to https://github.com/rust-lang/rust/issues/70333

WaffleLapkin avatar Mar 16 '22 10:03 WaffleLapkin

@zesterer I think this is related to rust-lang/rust#70333

This does indeed appear to be similar, although this seems to occur for generic traits too (or, at least, LLVM IR is emitted, even if the functions are omitted from the final binary).

zesterer avatar Mar 16 '22 11:03 zesterer

I don't know if it can be just LLVM IR. I have a project that uses Chumsky and cargo check takes 2.5 minutes. Ok my computer is fairly old, and I'm running Windows, but even so... It's not a big parser - only 500 LOC, and actually it is only a tokeniser. But the final type is

Recovery<Or<Or<Or<Or<Or<Or<Or<Map<Map<Then<Just<char, char, {unknown}>, impl Parser<char, <char as Character>::Collection, Error = {unknown}> + Copy + Clone>, fn((char, String)) -> String, (char, String)>, TyVal(String) -> Token, String>, Map<Map<Then<Just<char, &str, {unknown}>, impl Parser<char, <char as Character>::Collection, Error = {unknown}> + Copy + Clone>, fn((&str, String)) -> String, (&str, String)>, Hex(String) -> Token, String>>, Map<Map<Then<Just<char, &str, {unknown}>, impl Parser<char, <char as Character>::Collection, Error = {unknown}> + Copy + Clone>, fn((&str, String)) -> String, (&str, String)>, Bin(String) -> Token, String>>, Map<Map<Map<Then<OrNot<Just<char, char, {unknown}>>, Map<Then<impl Parser<char, <char as Character>::Collection, Error = {unknown}> + Copy + Clone, Map<Then<Just<char, char, {unknown}>, impl Parser<char, <char as Character>::Collection, Error = {unknown}> + Copy + Clone>, fn((char, String)) -> Vec<char>, (char, String)>>, fn((String, Vec<char>)) -> Vec<char>, (String, Vec<char>)>>, fn((Option<char>, Vec<char>)) -> Vec<char>, (Option<char>, Vec<char>)>, fn(Vec<char>) -> String, Vec<char>>, Real(String) -> Token, String>>, Map<Map<Map<Then<OrNot<Just<char, char, {unknown}>>, impl Parser<char, <char as Character>::Collection, Error = {unknown}> + Copy + Clone>, fn((Option<char>, String)) -> Vec<char>, (Option<char>, String)>, fn(Vec<char>) -> String, Vec<char>>, Num(String) -> Token, String>>, {unknown}>, Map<impl Parser<{unknown}, <{unknown} as Character>::Collection, Error = {unknown}> + Copy + Clone, impl Fn(String) -> Token, String>>, Or<Or<Or<Or<Or<Or<Or<Or<Or<Or<Or<Or<Or<Or<Or<Or<Or<Or<Or<Or<Or<Or<Or<Or<Or<Or<Or<Or<Or<Or<Or<Or<Or<Or<Or<Or<To<Just<char, &str, {unknown}>, &str, Token>, To<Just<char, &str, {unknown}>, &str, Token>>, To<Just<char, &str, {unknown}>, &str, Token>>, To<Just<char, &str, {unknown}>, &str, Token>>, To<Just<char, &str, {unknown}>, &str, Token>>, To<Just<char, &str, {unknown}>, &str, Token>>, To<Just<char, &str, {unknown}>, &str, Token>>, To<Just<char, &str, {unknown}>, &str, Token>>, To<Just<char, &str, {unknown}>, &str, Token>>, To<Just<char, &str, {unknown}>, &str, Token>>, To<Just<char, &str, {unknown}>, &str, Token>>, To<Just<char, &str, {unknown}>, &str, Token>>, To<Just<char, &str, {unknown}>, &str, Token>>, To<Just<char, &str, {unknown}>, &str, Token>>, To<Just<char, char, {unknown}>, char, Token>>, To<Just<char, char, {unknown}>, char, Token>>, To<Just<char, char, {unknown}>, char, Token>>, To<Just<char, char, {unknown}>, char, Token>>, To<Just<char, char, {unknown}>, char, Token>>, To<Just<char, char, {unknown}>, char, Token>>, To<Just<char, char, {unknown}>, char, Token>>, To<Just<char, char, {unknown}>, char, Token>>, To<Just<char, char, {unknown}>, char, Token>>, To<Just<char, char, {unknown}>, char, Token>>, To<Just<char, char, {unknown}>, char, Token>>, To<Just<char, char, {unknown}>, char, Token>>, To<Just<char, char, {unknown}>, char, Token>>, To<Just<char, char, {unknown}>, char, Token>>, To<Just<char, char, {unknown}>, char, Token>>, To<Just<char, char, {unknown}>, char, Token>>, To<Just<char, char, {unknown}>, char, Token>>, To<Just<char, char, {unknown}>, char, Token>>, To<Just<char, char, {unknown}>, char, Token>>, To<Just<char, char, {unknown}>, char, Token>>, To<Just<char, char, {unknown}>, char, Token>>, To<Just<char, char, {unknown}>, char, Token>>, To<Just<char, char, {unknown}>, char, Token>>>, SkipThenRetryUntil<{unknown}, 0>>

I'm not that surprised the cargo check takes so long. I wonder if it would be possibly to manually type erase things at strategic points to keep the types a sane size. I think that's what they were planning to do in the Xylem GUI project - in theory they would make the entire GUI one big static type, just like this. But IIRC to keep compile times sane they were planning to let you break your GUI up into a small number of View types that are dynamically dispatched instead of statically.

(I don't really know what I'm talking about so I may be way off here.)

And yes I need to sort out that crazy .or() chain, but it wasn't obvious when I was writing the code that it would cause this issue...

Timmmm avatar Aug 26 '23 20:08 Timmmm

I wonder if it would be possible to manually type erase things at strategic points to keep the types a sane size.

The Parser trait has a .boxed() method that lets you erase the type of parser and use dynamic dispatch, and you can manually intersperse it throughout your code where you like :)

And yes I need to sort out that crazy .or() chain...

A long .or() chain like this...

a()
    .or(b())
    .or(c())
    .or(/* ... */)

can instead be written with choice() like so:

choice((
    a(),
    b(),
    c(),
    // ...
))

(This produces a simpler type.)

Combining these two methods should help you reduce compile times.

wackbyte avatar Aug 26 '23 23:08 wackbyte

Aha you're way ahead of me! Using choice brought the cargo check time down to just a couple of seconds. I did get confused for a while because I didn't know it only works with up to 26 options, and the error message you get is not very clear.

Both of these issues are pretty well documented but I wonder if there's a way to make these clearer for people who don't exhaustively read documentation though (i.e. everyone)... Though tbh I can't think of any really good ones.

Timmmm avatar Aug 27 '23 08:08 Timmmm

There is a section in the guide about this. When the guide is ready it'll be very clearly linked from the README, so hopefully more people will happen upon it.

zesterer avatar Aug 27 '23 13:08 zesterer