oxc perf(lexer): pad `Token` to 16 bytes

#9918 caused a small regression on lexer and parser benchmarks because it introduced an uninitialized padding byte into Token.

Add another dummy u8 to fill the uninitialized hole.

Mar 20 '25 12:03 overlookmotel

#9926 👈 (View in Graphite)
main

How to use the Graphite Merge Queue

Add either label to this PR to merge it via the merge queue:

0-merge - adds this PR to the back of the merge queue
hotfix - for urgent hot fixes, skip the queue and merge this PR next

You must have a Graphite account in order to use the merge queue. Sign up using this link.

_{An organization admin has enabled the Graphite Merge Queue in this repository.} _{Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue.}

This stack of pull requests is managed by Graphite. Learn more about stacking.

Mar 20 '25 12:03 overlookmotel

CodSpeed Performance Report

Merging #9926 will not alter performance

_{Comparing 03-20-perf_lexer_pad_token_to_16_bytes (c54c02a) with main (33c1c76)}

Summary

✅ 33 untouched benchmarks

Mar 20 '25 12:03 codspeed-hq[bot]

I'm not seeing any difference, both are size = 16 (0x10), align = 0x4. What are property do I need to inspect? Can this be a compile time check?

We already have this

#[cfg(test)]
mod size_asserts {
    use super::Token;
    const _: () = assert!(std::mem::size_of::<Token>() == 16);
}

why is this not enough?

Mar 20 '25 15:03 Boshen

Unfortunately, Rust offers no tools to detect whether a type contains padding. So I don't think it's possible to implement a test.

This PR does not affect type size or alignment, only whether any of the bytes representing the type are uninitialized. When they are, creating and copying a Token is more instructions, as compiler avoids ever reading or writing the uninitialized byte. So e.g. copying a Token is 2 x 8-byte reads + 2 x 8-byte writes (overlapping). Whereas with the _padding: u8, _padding2: u16 fields added, there are no unitialized bytes, so copying Token is just 1 x 16-byte read + 1 x 16-byte write.

This is a micro-micro-optimization, but creating and copying Tokens is such an extremely hot path that it makes a measureable difference. #3283 was the one that revealed this - 5% speed-up just due to the fields of Token getting re-arranged.

But this PR isn't succeeding in fixing the perf loss. Looks like changing the u32 field to a u16 has caused the other fields to move around, which is also a regression. I'll try to fix it.

Mar 21 '25 03:03 overlookmotel