truth Size of parser stack items makes parsing slow

One item that appears on benchmarks (currently measuring about 8% on anm-benchmark for th15/title.anm) is memcpy calls in lalrparser::__parse__Anything::__reduce:

After some digging in binja and CE, it appears that the majority of memcpy samples counted were likely in these calls to __symbols.pop():

These (Location, Symbol, Location) tuples are currently 256 bytes large. __Symbol is a union of all kinds of nonterminal:

pub enum __Symbol<'input>
     {
        Variant0(Token<'input>),
        Variant1(&'input str),
        Variant2(::std::option::Option<Token<'input>>),
        Variant3(Sp<i32>),
        Variant4(::std::option::Option<Sp<i32>>),
        Variant5(()),
        ...
        Variant13(Sp<crate::parse::AnythingValue>),  // <-- largest variant
        ...
        Variant85(ast::Var),
        Variant86(ast::VarDeclKeyword),
    }

It's 256 bytes because Stmts are currently 208 bytes. (then there's the 8-byte discriminant in AnythingValue, the 16 bytes added by Sp<...>, another 8 bytes for __Symbol's discriminant, then 8 bytes for each Location).

We could probably speed things up a little bit by making Stmt smaller. I'm not sure how much we can do effectively short term aside from boxing some things, which has it's own cost; in the long term, arenas may help. Also, It would be fantabulous if we could figure out how to fix Spans to only be 8 bytes instead of an awkwardly-padded 12.

More importantly, in the meanwhile.... we want to be careful about letting Stmt grow any bigger.

Feb 14 '21 04:02 ExpHP

Current status:

Sp<Stmt> is currently 320 bytes
If I use black magic and sorcery to make Span and Option<Span> both 8 bytes, the size reduces to 256.

This 25% size reduction equated to a 10% runtime reduction, when running on all 7 EoSD ECL files. (note: WSL 2, files stored on NTFS)

This "black magic and sorcery" actually makes it quite difficult to reason about things. I might consider sticking some boxes into Stmt instead...

Nov 14 '21 19:11 ExpHP

Boxing Stmt did not help.

I made all Stmt-producing productions return Box<ast::Stmt> or Box<ast::StmtKind> instead and dereferenced each when inserted into an AST node. I copied the symbol enum from generated.rs and verified that the size was now reduced to 192 bytes.

Unfortunately the runtime performance did not change from what it is currently.

Doing this required me to write this travesty in place of what used to be <stmts:(Sp<Stmt>)*>:

#[inline]
SpStmts0: Vec<Sp<ast::Stmt>> = {
    => vec![],
    SpStmts1,
};

SpStmts1: Vec<Sp<ast::Stmt>> = {
    <stmt:Sp<BoxStmt>> => vec![sp!(stmt.span => *stmt.value)],
    <stmts:SpStmts1> <stmt:Sp<BoxStmt>> => util::push(stmts, sp!(stmt.span => *stmt.value)),
};

An aside: It looks like inline rules in the grammar still generate states that are placed on the stack.

I discovered this when I tried to clean up the above code using the following so that I could write <stmts:(Sp<Unbox<BoxStmt>>)*>:

// NOTE:  pub type UnboxType<T> = <T as core::ops::Deref>::Target;
#[inline]
Unbox<Rule>: util::UnboxType<Rule> = <r:Rule> => *r;

Unfortunately despite the #[inline] this still produces a variant:

Variant102(util::UnboxType<Box<ast::Stmt>>),

Nov 14 '21 21:11 ExpHP

From with the boxed Stmts.

Doesn't look like a significant contribution from allocation, it's still all memcpy...

Nov 14 '21 21:11 ExpHP

I tried replacing the boxes with a qcell-based Pool abstraction with a tiny fixed-size pool of two reusable Option<ast::Stmt> and Option<ast::StmtKind>s. The statements in the symbol enum effectively became &'a LCell<Option<T>>; pointer-sized values that could be created from (and returned into) T without any allocation or synchronization.

This still had no effect.

This seems paradoxical. It would leads me to believe that it is perhaps not the size of the Symbol enum that is the problem, but rather the size of the ast structs in general (including Stmt). But, were that the case, then the loop desugaring/decompilation passes should show up much more on here...

I don't know.

Relavant branches:

no-file-id: 2de9d83
try-qcell: 43e0ba1

Nov 14 '21 23:11 ExpHP

truth truth copied to clipboard

Size of parser stack items makes parsing slow

truth
truth copied to clipboard