perf(lexer): pad `Token` to 16 bytes
#9918 caused a small regression on lexer and parser benchmarks because it introduced an uninitialized padding byte into Token.
Add another dummy u8 to fill the uninitialized hole.
- #9926
👈 (View in Graphite) main
How to use the Graphite Merge Queue
Add either label to this PR to merge it via the merge queue:
- 0-merge - adds this PR to the back of the merge queue
- hotfix - for urgent hot fixes, skip the queue and merge this PR next
You must have a Graphite account in order to use the merge queue. Sign up using this link.
An organization admin has enabled the Graphite Merge Queue in this repository.
Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue.
This stack of pull requests is managed by Graphite. Learn more about stacking.
CodSpeed Performance Report
Merging #9926 will not alter performance
Comparing 03-20-perf_lexer_pad_token_to_16_bytes (c54c02a) with main (33c1c76)
Summary
✅ 33 untouched benchmarks
I'm not seeing any difference, both are size = 16 (0x10), align = 0x4. What are property do I need to inspect? Can this be a compile time check?
We already have this
#[cfg(test)]
mod size_asserts {
use super::Token;
const _: () = assert!(std::mem::size_of::<Token>() == 16);
}
why is this not enough?
Unfortunately, Rust offers no tools to detect whether a type contains padding. So I don't think it's possible to implement a test.
This PR does not affect type size or alignment, only whether any of the bytes representing the type are uninitialized. When they are, creating and copying a Token is more instructions, as compiler avoids ever reading or writing the uninitialized byte. So e.g. copying a Token is 2 x 8-byte reads + 2 x 8-byte writes (overlapping). Whereas with the _padding: u8, _padding2: u16 fields added, there are no unitialized bytes, so copying Token is just 1 x 16-byte read + 1 x 16-byte write.
This is a micro-micro-optimization, but creating and copying Tokens is such an extremely hot path that it makes a measureable difference. #3283 was the one that revealed this - 5% speed-up just due to the fields of Token getting re-arranged.
But this PR isn't succeeding in fixing the perf loss. Looks like changing the u32 field to a u16 has caused the other fields to move around, which is also a regression. I'll try to fix it.