oxc icon indicating copy to clipboard operation
oxc copied to clipboard

Fast path for parsing decimal literals

Open overlookmotel opened this issue 9 months ago • 2 comments

#3283 made an optimization to the parser to avoid checking for _s in numeric literals twice, and got a nice 1% speed-up on parser benchmarks.

It occurs to me that we could repeat the trick for the common case of simple decimal literals (e.g. 0, 1, 123). str::parse::<f64>() is quite a complicated function which checks for all kinds of things (exponent, decimal point, valid input etc). All of this is repeated work, since we've already determined in the lexer what the input is.

Perhaps we could have a flag which lexer sets if the input is a simple case, and then use a hand-rolled parse_simple_decimal function to convert to f64, like the ones we have for binary and octal?

https://github.com/oxc-project/oxc/blob/f38d138d970f97897d545f42cdd7a0068b5e894c/crates/oxc_parser/src/lexer/number.rs#L76-L88

(we should base implementation on std's str::parse::<f64> as it's probably very optimized already, but just cut out all the extraneous checks)

Given that #3283 got a 1% speed-up, maybe this can get the same again, or maybe more?

@DonIsaac Would you be interested in investigating?

overlookmotel avatar May 15 '24 08:05 overlookmotel

As I mentioned in #3289, it's mysterious why #3283 got a 5% speed bump on lexer benchmarks.

Before we test out what I'm suggesting in this issue, we should investigate that further. If it turns out that converting the padding bytes in Token from u8 to bool is a speed-gain in itself, we should do that first, so then we can measure the impact of the change described here in isolation (as it will also involve using one of those padding bytes as a bool).

overlookmotel avatar May 15 '24 08:05 overlookmotel

Just did something similar for non-decimals in #3296. Decimals are up next

DonIsaac avatar May 15 '24 18:05 DonIsaac

@overlookmotel I discovered that Kind::Decimals are never floating point numbers. However, when I tried parsing them as an i64 then casting the result to an f64, I found no performance improvements. However, my local benchmarking is not always accurate, so I'll make a small PR so we can see bench results from CI

DonIsaac avatar May 16 '24 15:05 DonIsaac

Is this resolved?

Boshen avatar Jul 14 '24 04:07 Boshen

I don't believe so.

However, when I tried parsing them as an i64 then casting the result to an f64, I found no performance improvements.

@DonIsaac Did you try parsing to a u64 (not i64)? Or maybe i64 was a typo?

overlookmotel avatar Jul 14 '24 10:07 overlookmotel

I don't believe so.

However, when I tried parsing them as an i64 then casting the result to an f64, I found no performance improvements.

@DonIsaac Did you try parsing to a u64 (not i64)? Or maybe i64 was a typo?

I did, but I saw no performance gains.

DonIsaac avatar Jul 14 '24 14:07 DonIsaac

Did you try implementing it manually like parse_binary and parse_octal (rather than using s.parse::<u64>())?

If you're in parse_int_without_underscores and kind is Kind::Decimal, then we know all the digits are 0-9. This means parsing is infallible and no need to handle all the other oddities which s.parse::<u64>() does (e, +, -, illegal chars, zero length etc). I'd imagine a hand-written version without all these complications would be faster.

overlookmotel avatar Jul 14 '24 15:07 overlookmotel

I gave it a go (#4257). Yes, hardly any effect. Surprising!

overlookmotel avatar Jul 14 '24 16:07 overlookmotel

@DonIsaac I'm frustrated how little effect this change had, though it in retrospect it makes sense - multiplying by 10 is just inherently a (relatively) slow operation. Why oh why don't humans have 8 fingers and then decimal would never have existed? 😃

Do you think there'd be any gain in adding a token kind for single digit number, with a fast path? 0 is likely the most common number used in JS code e.g. for (let i = 0; i < arr.length; i++) and other single-digit numbers are probably common too. Or do you think that's too much of a micro-opt to be unlikely to produce any real gain?

overlookmotel avatar Jul 15 '24 11:07 overlookmotel