oxc icon indicating copy to clipboard operation
oxc copied to clipboard

perf(lexer): pad `Token` to 16 bytes

Open overlookmotel opened this issue 9 months ago • 4 comments

#9918 caused a small regression on lexer and parser benchmarks because it introduced an uninitialized padding byte into Token.

Add another dummy u8 to fill the uninitialized hole.

overlookmotel avatar Mar 20 '25 12:03 overlookmotel


How to use the Graphite Merge Queue

Add either label to this PR to merge it via the merge queue:

  • 0-merge - adds this PR to the back of the merge queue
  • hotfix - for urgent hot fixes, skip the queue and merge this PR next

You must have a Graphite account in order to use the merge queue. Sign up using this link.

An organization admin has enabled the Graphite Merge Queue in this repository.

Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue.

This stack of pull requests is managed by Graphite. Learn more about stacking.

overlookmotel avatar Mar 20 '25 12:03 overlookmotel

CodSpeed Performance Report

Merging #9926 will not alter performance

Comparing 03-20-perf_lexer_pad_token_to_16_bytes (c54c02a) with main (33c1c76)

Summary

✅ 33 untouched benchmarks

codspeed-hq[bot] avatar Mar 20 '25 12:03 codspeed-hq[bot]

I'm not seeing any difference, both are size = 16 (0x10), align = 0x4. What are property do I need to inspect? Can this be a compile time check?

We already have this

#[cfg(test)]
mod size_asserts {
    use super::Token;
    const _: () = assert!(std::mem::size_of::<Token>() == 16);
}

why is this not enough?

Boshen avatar Mar 20 '25 15:03 Boshen

Unfortunately, Rust offers no tools to detect whether a type contains padding. So I don't think it's possible to implement a test.

This PR does not affect type size or alignment, only whether any of the bytes representing the type are uninitialized. When they are, creating and copying a Token is more instructions, as compiler avoids ever reading or writing the uninitialized byte. So e.g. copying a Token is 2 x 8-byte reads + 2 x 8-byte writes (overlapping). Whereas with the _padding: u8, _padding2: u16 fields added, there are no unitialized bytes, so copying Token is just 1 x 16-byte read + 1 x 16-byte write.

This is a micro-micro-optimization, but creating and copying Tokens is such an extremely hot path that it makes a measureable difference. #3283 was the one that revealed this - 5% speed-up just due to the fields of Token getting re-arranged.

But this PR isn't succeeding in fixing the perf loss. Looks like changing the u32 field to a u16 has caused the other fields to move around, which is also a regression. I'll try to fix it.

overlookmotel avatar Mar 21 '25 03:03 overlookmotel