deno icon indicating copy to clipboard operation
deno copied to clipboard

Executing 2 noop in deno bench trigger a 10x delta

Open Nainterceptor opened this issue 2 years ago • 3 comments

Hello,

I've a behavior in Deno.bench that I cannot explain :

Deno.bench("noop", () => {});
Deno.bench("noop", () => {});
Deno.bench("noop", () => {});
❯ deno bench --unstable noop.ts
cpu: Apple M1
runtime: deno 1.23.3 (aarch64-apple-darwin)

benchmark      time (avg)             (min … max)       p75       p99      p995
------------------------------------------------- -----------------------------
noop       490.06 ps/iter   (466.6 ps … 47.63 ns)  479.2 ps  516.7 ps  529.1 ps
noop         4.29 ns/iter    (4.12 ns … 32.86 ns)   4.22 ns   4.67 ns   5.38 ns
noop         4.22 ns/iter    (4.12 ns … 33.54 ns)   4.23 ns   4.58 ns   4.79 ns

Easy to say that second and third results are close and are explained by CPU activity, but why the first is ~10x faster ? Maybe something like "reset" job between them, that are not executed for the first, but, in this case, is it possible to execute this "reset" before the first job too, to be consistent ? Or just don't monitor them ?

Best regards, Gaël

Nainterceptor avatar Jul 22 '22 14:07 Nainterceptor

The mitata library has a good explanation:

https://github.com/evanwashere/mitata#jit-bias

"JIT bias". In simple words, v8 and JSC JIT expect us to pass the same function, so they optimize for it, but we break that promise and get deoptimized.

littledivy avatar Jul 23 '22 13:07 littledivy

I agree but it's not from 323.34ps to 387.15ps here, but 490ps to 4290ps, what's expected to prevent introducing this bias ? running a noop bench before other bench ? Example (stupid benchmark, but a good example I guess :p) : Trying to know who's faster between a function doing a sum and a function doing a substract.

function sum (number1: number, number2: number) {
    return number1 + number2;
}

function substract (number1: number, number2: number) {
    return number1 - number2;
}

Deno.bench("sum", () => {
    sum(1,1);
    sum(2,2);
});
Deno.bench("substract", () => {
    substract(1, 1);
    substract(2, 2);
});
cpu: Apple M1
runtime: deno 1.23.3 (aarch64-apple-darwin)

file:///Users/panda/Projects/steuli/backend/benchmark/bench.ts
benchmark      time (avg)             (min … max)       p75       p99      p995
------------------------------------------------- -----------------------------
sum        495.23 ps/iter   (466.6 ps … 35.36 ns)  487.5 ps    525 ps  533.4 ps
substract    4.26 ns/iter     (4.14 ns … 32.8 ns)   4.17 ns   5.23 ns   5.35 ns

Sum is faster !

cpu: Apple M1
runtime: deno 1.23.3 (aarch64-apple-darwin)

file:///Users/panda/Projects/steuli/backend/benchmark/bench.ts
benchmark      time (avg)             (min … max)       p75       p99      p995
------------------------------------------------- -----------------------------
substract  492.77 ps/iter    (466.6 ps … 34.6 ns)  487.5 ps    525 ps  533.4 ps
sum          4.36 ns/iter    (4.14 ns … 32.41 ns)  

Or... Substract is faster

With a noop before :

cpu: Apple M1
runtime: deno 1.23.3 (aarch64-apple-darwin)

file:///Users/panda/Projects/steuli/backend/benchmark/bench.ts
benchmark      time (avg)             (min … max)       p75       p99      p995
------------------------------------------------- -----------------------------
noop       493.75 ps/iter   (466.6 ps … 44.54 ns)  483.3 ps  529.1 ps  537.5 ps
substract    4.38 ns/iter    (4.15 ns … 34.91 ns)   4.25 ns   5.38 ns    6.3 ns
sum          4.44 ns/iter     (4.32 ns … 8.17 ns)   4.44 ns   5.22 ns   5.56 ns

More consistent !

Nainterceptor avatar Jul 23 '22 15:07 Nainterceptor

Still a case of 'JIT bias', notice the max results for the first bench. V8 deoptimizes when it sees that the function is not what it expected.

Also, adding two numbers is way too fast to correctly measure and in fact, you could be measuring the timer overhead itself. Try measuring something with meaningful overhead:

Deno.bench("Date.now()", () => Date.now());
Deno.bench("Date.now()", () => Date.now());
cpu: Apple M1
runtime: deno 1.24.0 (aarch64-apple-darwin)

file:///bench.js
benchmark       time (avg)             (min … max)       p75       p99      p995
-------------------------------------------------- -----------------------------
Date.now()   34.02 ns/iter   (32.43 ns … 69.22 ns)  35.65 ns  51.09 ns  51.94 ns
Date.now()   34.94 ns/iter   (33.87 ns … 42.49 ns)  36.51 ns  39.75 ns  40.49 ns

littledivy avatar Jul 23 '22 16:07 littledivy

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Sep 24 '22 06:09 stale[bot]

The mitata library has a good explanation:

evanwashere/mitata#jit-bias

"JIT bias". In simple words, v8 and JSC JIT expect us to pass the same function, so they optimize for it, but we break that promise and get deoptimized.

Should Deno prevent consumers from encountering this optimization on the functions of its benchmark API, for consistency?

bb010g avatar Oct 03 '22 03:10 bb010g

We could warmup Deno.bench before running user benchmarks (that won't guarantee consistency, the optimization behviour in V8 might change)

littledivy avatar Oct 03 '22 04:10 littledivy

@littledivy is right here, we just hit this problem again today. We should an empty function benched before we start running user benchmarks. If anyone is interested in fixing this problem I'll be glad to provide pointers.

bartlomieju avatar Dec 22 '22 20:12 bartlomieju