oxc icon indicating copy to clipboard operation
oxc copied to clipboard

Change type of padding in lexer `Token`?

Open overlookmotel opened this issue 9 months ago • 1 comments

#3283 showed a mysterious 5% speed-up on lexer benchmarks (see https://github.com/oxc-project/oxc/pull/3283#issuecomment-2111835472).

It's unclear what caused this, as the changes to the lexer in that PR actually make the lexer do more work, not less.

Only explanation I can see is that for some reason, replacing one of the u8 padding bytes in Token with a bool produced better codegen somehow (the mysteries of the compiler!)

We could try converting the other 4 padding bytes u32 to bools too [bool; 4], and see if that makes a difference.

Ideally we'd use a type which has no "active bits" at all instead of bool, so compiler can know it can leave these fields uninitialized. A type which is like (), but with size 1 instead of size 0. But does such a type exist? Generally such a type would be completely useless!

overlookmotel avatar May 15 '24 08:05 overlookmotel

Actually I think I may have figured it out!

Changing the field from u8 to bool altered the type layout.

After #3283:

#[derive(Debug, Clone, Copy, Default)]
pub struct Token {
    pub kind: Kind,
    pub start: u32,
    pub end: u32,
    pub is_on_new_line: bool,
    pub escaped: bool,
    has_separator: bool,
    _padding2: u32,
}

const _: () = {
    assert!(std::mem::offset_of!(Token, start) == 0);
    assert!(std::mem::offset_of!(Token, end) == 4);
    assert!(std::mem::offset_of!(Token, kind) == 12);
    assert!(std::mem::offset_of!(Token, is_on_new_line) == 13);
    assert!(std::mem::offset_of!(Token, escaped) == 14);
    assert!(std::mem::offset_of!(Token, has_separator) == 15);
    assert!(std::mem::offset_of!(Token, _padding2) == 8);
};

Whereas before:

#[derive(Debug, Clone, Copy, Default)]
pub struct TokenBefore {
    pub kind: Kind,
    pub start: u32,
    pub end: u32,
    pub is_on_new_line: bool,
    pub escaped: bool,
    _padding: u8,
    _padding2: u32,
}

const _: () = {
    assert!(std::mem::offset_of!(TokenBefore, start) == 0);
    assert!(std::mem::offset_of!(TokenBefore, end) == 4);
    assert!(std::mem::offset_of!(TokenBefore, kind) == 13);
    assert!(std::mem::offset_of!(TokenBefore, is_on_new_line) == 14);
    assert!(std::mem::offset_of!(TokenBefore, escaped) == 15);
    assert!(std::mem::offset_of!(TokenBefore, _padding) == 12);
    assert!(std::mem::offset_of!(TokenBefore, _padding2) == 8);
};

Notice that kind was at offset 13, whereas now it's at offset 12. Maybe being at 12 is better, because it saves an op if load the last 4 bytes into a register, then no shift is required to get kind. And kind is the most-accessed field.

Maybe. Maybe? But 5%??!

If we want to squeeze every drop out of this, then probably rather than changing padding field from u32 to [bool; 4], we should make the type #[repr(C)] and then reorder the fields manually until we get the best perf out of it.

Likely better to move the padding u32 to the end (offset 12) and have the 1-byte fields grouped together in bytes 8-11, with kind first.

It may be better to put all the 1-byte fields at the start, as kind is the most accessed. Then followed by start and end, and finally the padding.

Or maybe 1-byte fields at the start, then the padding, then start and end. Make the struct #[repr(C, align(8))] so that start + end occupy bytes 8-15, and can be loaded together into a register with a single 8-byte aligned read.

This is all real micro-opt stuff. It only makes a difference here because creating tokens is the majority of what the lexer does, so it's the definition of a hot path. And it's not academic - #3283 shows we are seeing real world gains from changing the layout.

overlookmotel avatar May 15 '24 15:05 overlookmotel

https://github.com/oxc-project/oxc/pull/3964#issuecomment-2198487332

No performance improvements after numerous attempts.

Boshen avatar Jun 30 '24 10:06 Boshen