truth
truth copied to clipboard
Size of parser stack items makes parsing slow
One item that appears on benchmarks (currently measuring about 8% on anm-benchmark
for th15/title.anm
) is memcpy calls in lalrparser::__parse__Anything::__reduce
:
After some digging in binja and CE, it appears that the majority of memcpy
samples counted were likely in these calls to __symbols.pop()
:
These (Location, Symbol, Location)
tuples are currently 256 bytes large. __Symbol
is a union of all kinds of nonterminal:
pub enum __Symbol<'input>
{
Variant0(Token<'input>),
Variant1(&'input str),
Variant2(::std::option::Option<Token<'input>>),
Variant3(Sp<i32>),
Variant4(::std::option::Option<Sp<i32>>),
Variant5(()),
...
Variant13(Sp<crate::parse::AnythingValue>), // <-- largest variant
...
Variant85(ast::Var),
Variant86(ast::VarDeclKeyword),
}
It's 256 bytes because Stmts are currently 208 bytes. (then there's the 8-byte discriminant in AnythingValue, the 16 bytes added by Sp<...>, another 8 bytes for __Symbol's discriminant, then 8 bytes for each Location).
We could probably speed things up a little bit by making Stmt
smaller. I'm not sure how much we can do effectively short term aside from boxing some things, which has it's own cost; in the long term, arenas may help. Also, It would be fantabulous if we could figure out how to fix Spans to only be 8 bytes instead of an awkwardly-padded 12.
More importantly, in the meanwhile.... we want to be careful about letting Stmt
grow any bigger.
Current status:
-
Sp<Stmt>
is currently 320 bytes - If I use black magic and sorcery to make
Span
andOption<Span>
both 8 bytes, the size reduces to 256.
This 25% size reduction equated to a 10% runtime reduction, when running on all 7 EoSD ECL files. (note: WSL 2, files stored on NTFS)
This "black magic and sorcery" actually makes it quite difficult to reason about things. I might consider sticking some boxes into Stmt instead...
Boxing Stmt
did not help.
I made all Stmt-producing productions return Box<ast::Stmt>
or Box<ast::StmtKind>
instead and dereferenced each when inserted into an AST node. I copied the symbol enum from generated.rs
and verified that the size was now reduced to 192 bytes.
Unfortunately the runtime performance did not change from what it is currently.
Doing this required me to write this travesty in place of what used to be <stmts:(Sp<Stmt>)*>
:
#[inline]
SpStmts0: Vec<Sp<ast::Stmt>> = {
=> vec![],
SpStmts1,
};
SpStmts1: Vec<Sp<ast::Stmt>> = {
<stmt:Sp<BoxStmt>> => vec![sp!(stmt.span => *stmt.value)],
<stmts:SpStmts1> <stmt:Sp<BoxStmt>> => util::push(stmts, sp!(stmt.span => *stmt.value)),
};
An aside: It looks like inline rules in the grammar still generate states that are placed on the stack.
I discovered this when I tried to clean up the above code using the following so that I could write <stmts:(Sp<Unbox<BoxStmt>>)*>
:
// NOTE: pub type UnboxType<T> = <T as core::ops::Deref>::Target;
#[inline]
Unbox<Rule>: util::UnboxType<Rule> = <r:Rule> => *r;
Unfortunately despite the #[inline]
this still produces a variant:
Variant102(util::UnboxType<Box<ast::Stmt>>),
From with the boxed Stmts.
Doesn't look like a significant contribution from allocation, it's still all memcpy...
I tried replacing the boxes with a qcell
-based Pool
abstraction with a tiny fixed-size pool of two reusable Option<ast::Stmt>
and Option<ast::StmtKind>
s. The statements in the symbol enum effectively became &'a LCell<Option<T>>
; pointer-sized values that could be created from (and returned into) T without any allocation or synchronization.
This still had no effect.
This seems paradoxical. It would leads me to believe that it is perhaps not the size of the Symbol enum that is the problem, but rather the size of the ast structs in general (including Stmt). But, were that the case, then the loop desugaring/decompilation passes should show up much more on here...
I don't know.
Relavant branches:
- no-file-id: 2de9d83
- try-qcell: 43e0ba1