deno
deno copied to clipboard
Executing 2 noop in deno bench trigger a 10x delta
Hello,
I've a behavior in Deno.bench that I cannot explain :
Deno.bench("noop", () => {});
Deno.bench("noop", () => {});
Deno.bench("noop", () => {});
❯ deno bench --unstable noop.ts
cpu: Apple M1
runtime: deno 1.23.3 (aarch64-apple-darwin)
benchmark time (avg) (min … max) p75 p99 p995
------------------------------------------------- -----------------------------
noop 490.06 ps/iter (466.6 ps … 47.63 ns) 479.2 ps 516.7 ps 529.1 ps
noop 4.29 ns/iter (4.12 ns … 32.86 ns) 4.22 ns 4.67 ns 5.38 ns
noop 4.22 ns/iter (4.12 ns … 33.54 ns) 4.23 ns 4.58 ns 4.79 ns
Easy to say that second and third results are close and are explained by CPU activity, but why the first is ~10x faster ? Maybe something like "reset" job between them, that are not executed for the first, but, in this case, is it possible to execute this "reset" before the first job too, to be consistent ? Or just don't monitor them ?
Best regards, Gaël
The mitata
library has a good explanation:
https://github.com/evanwashere/mitata#jit-bias
"JIT bias". In simple words, v8 and JSC JIT expect us to pass the same function, so they optimize for it, but we break that promise and get deoptimized.
I agree but it's not from 323.34ps to 387.15ps here, but 490ps to 4290ps, what's expected to prevent introducing this bias ? running a noop bench before other bench ? Example (stupid benchmark, but a good example I guess :p) : Trying to know who's faster between a function doing a sum and a function doing a substract.
function sum (number1: number, number2: number) {
return number1 + number2;
}
function substract (number1: number, number2: number) {
return number1 - number2;
}
Deno.bench("sum", () => {
sum(1,1);
sum(2,2);
});
Deno.bench("substract", () => {
substract(1, 1);
substract(2, 2);
});
cpu: Apple M1
runtime: deno 1.23.3 (aarch64-apple-darwin)
file:///Users/panda/Projects/steuli/backend/benchmark/bench.ts
benchmark time (avg) (min … max) p75 p99 p995
------------------------------------------------- -----------------------------
sum 495.23 ps/iter (466.6 ps … 35.36 ns) 487.5 ps 525 ps 533.4 ps
substract 4.26 ns/iter (4.14 ns … 32.8 ns) 4.17 ns 5.23 ns 5.35 ns
Sum is faster !
cpu: Apple M1
runtime: deno 1.23.3 (aarch64-apple-darwin)
file:///Users/panda/Projects/steuli/backend/benchmark/bench.ts
benchmark time (avg) (min … max) p75 p99 p995
------------------------------------------------- -----------------------------
substract 492.77 ps/iter (466.6 ps … 34.6 ns) 487.5 ps 525 ps 533.4 ps
sum 4.36 ns/iter (4.14 ns … 32.41 ns)
Or... Substract is faster
With a noop before :
cpu: Apple M1
runtime: deno 1.23.3 (aarch64-apple-darwin)
file:///Users/panda/Projects/steuli/backend/benchmark/bench.ts
benchmark time (avg) (min … max) p75 p99 p995
------------------------------------------------- -----------------------------
noop 493.75 ps/iter (466.6 ps … 44.54 ns) 483.3 ps 529.1 ps 537.5 ps
substract 4.38 ns/iter (4.15 ns … 34.91 ns) 4.25 ns 5.38 ns 6.3 ns
sum 4.44 ns/iter (4.32 ns … 8.17 ns) 4.44 ns 5.22 ns 5.56 ns
More consistent !
Still a case of 'JIT bias', notice the max
results for the first bench. V8 deoptimizes when it sees that the function is not what it expected.
Also, adding two numbers is way too fast to correctly measure and in fact, you could be measuring the timer overhead itself. Try measuring something with meaningful overhead:
Deno.bench("Date.now()", () => Date.now());
Deno.bench("Date.now()", () => Date.now());
cpu: Apple M1
runtime: deno 1.24.0 (aarch64-apple-darwin)
file:///bench.js
benchmark time (avg) (min … max) p75 p99 p995
-------------------------------------------------- -----------------------------
Date.now() 34.02 ns/iter (32.43 ns … 69.22 ns) 35.65 ns 51.09 ns 51.94 ns
Date.now() 34.94 ns/iter (33.87 ns … 42.49 ns) 36.51 ns 39.75 ns 40.49 ns
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.
The
mitata
library has a good explanation:"JIT bias". In simple words, v8 and JSC JIT expect us to pass the same function, so they optimize for it, but we break that promise and get deoptimized.
Should Deno prevent consumers from encountering this optimization on the functions of its benchmark API, for consistency?
We could warmup Deno.bench
before running user benchmarks (that won't guarantee consistency, the optimization behviour in V8 might change)
@littledivy is right here, we just hit this problem again today. We should an empty function benched before we start running user benchmarks. If anyone is interested in fixing this problem I'll be glad to provide pointers.