oxc
oxc copied to clipboard
Change type of padding in lexer `Token`?
#3283 showed a mysterious 5% speed-up on lexer benchmarks (see https://github.com/oxc-project/oxc/pull/3283#issuecomment-2111835472).
It's unclear what caused this, as the changes to the lexer in that PR actually make the lexer do more work, not less.
Only explanation I can see is that for some reason, replacing one of the u8
padding bytes in Token
with a bool
produced better codegen somehow (the mysteries of the compiler!)
We could try converting the other 4 padding bytes u32
to bools too [bool; 4]
, and see if that makes a difference.
Ideally we'd use a type which has no "active bits" at all instead of bool
, so compiler can know it can leave these fields uninitialized. A type which is like ()
, but with size 1 instead of size 0. But does such a type exist? Generally such a type would be completely useless!
Actually I think I may have figured it out!
Changing the field from u8
to bool
altered the type layout.
After #3283:
#[derive(Debug, Clone, Copy, Default)]
pub struct Token {
pub kind: Kind,
pub start: u32,
pub end: u32,
pub is_on_new_line: bool,
pub escaped: bool,
has_separator: bool,
_padding2: u32,
}
const _: () = {
assert!(std::mem::offset_of!(Token, start) == 0);
assert!(std::mem::offset_of!(Token, end) == 4);
assert!(std::mem::offset_of!(Token, kind) == 12);
assert!(std::mem::offset_of!(Token, is_on_new_line) == 13);
assert!(std::mem::offset_of!(Token, escaped) == 14);
assert!(std::mem::offset_of!(Token, has_separator) == 15);
assert!(std::mem::offset_of!(Token, _padding2) == 8);
};
Whereas before:
#[derive(Debug, Clone, Copy, Default)]
pub struct TokenBefore {
pub kind: Kind,
pub start: u32,
pub end: u32,
pub is_on_new_line: bool,
pub escaped: bool,
_padding: u8,
_padding2: u32,
}
const _: () = {
assert!(std::mem::offset_of!(TokenBefore, start) == 0);
assert!(std::mem::offset_of!(TokenBefore, end) == 4);
assert!(std::mem::offset_of!(TokenBefore, kind) == 13);
assert!(std::mem::offset_of!(TokenBefore, is_on_new_line) == 14);
assert!(std::mem::offset_of!(TokenBefore, escaped) == 15);
assert!(std::mem::offset_of!(TokenBefore, _padding) == 12);
assert!(std::mem::offset_of!(TokenBefore, _padding2) == 8);
};
Notice that kind
was at offset 13, whereas now it's at offset 12. Maybe being at 12 is better, because it saves an op if load the last 4 bytes into a register, then no shift is required to get kind
. And kind
is the most-accessed field.
Maybe. Maybe? But 5%??!
If we want to squeeze every drop out of this, then probably rather than changing padding field from u32
to [bool; 4]
, we should make the type #[repr(C)]
and then reorder the fields manually until we get the best perf out of it.
Likely better to move the padding u32
to the end (offset 12) and have the 1-byte fields grouped together in bytes 8-11, with kind
first.
It may be better to put all the 1-byte fields at the start, as kind
is the most accessed. Then followed by start
and end
, and finally the padding.
Or maybe 1-byte fields at the start, then the padding, then start
and end
. Make the struct #[repr(C, align(8))]
so that start
+ end
occupy bytes 8-15, and can be loaded together into a register with a single 8-byte aligned read.
This is all real micro-opt stuff. It only makes a difference here because creating tokens is the majority of what the lexer does, so it's the definition of a hot path. And it's not academic - #3283 shows we are seeing real world gains from changing the layout.
https://github.com/oxc-project/oxc/pull/3964#issuecomment-2198487332
No performance improvements after numerous attempts.