Unexpected Results
To start, I want to thank @bheisler for all of the effort you've put into criterion and iai!!
I've been experimenting with iai and really like the notion of "one-shot" measuring for low level benchmarks. I've played round with it, but sometimes I get unexpected results. This could definitely be an error on my part, that is usually the case, but I've been unable to track it down and thus I've created this issue.
Of note, I get very consistent results if I do multiple runs of a single configuration. But sometimes I run into problems when I change something or run a slightly different command. I then can get results that look wrong to me.
First off, I use an Arch Linux system for development:
$ uname -a
Linux 3900x 6.0.12-arch1-1 #1 SMP PREEMPT_DYNAMIC Thu, 08 Dec 2022 11:03:38 +0000 x86_64 GNU/Linux
$ inxi -c
CPU: 12-Core AMD Ryzen 9 3900X (-MT MCP-) speed/min/max: 2200/2200/3800 MHz Kernel: 6.0.12-arch1-1 x86_64 Up: 9h 37m
Mem: 6392.5/32019.1 MiB (20.0%) Storage: 465.76 GiB (136.4% used) Procs: 481 Shell: bash 5.1.16 inxi: 3.1.03
I've created exper-iai with cargo new --lib which creates a lib with an fn add and test it_works:
$ cat -n src/lib.rs
1 pub fn add(left: usize, right: usize) -> usize {
2 left + right
3 }
4
5 #[cfg(test)]
6 mod tests {
7 use super::*;
8
9 #[test]
10 fn it_works() {
11 let result = add(2, 2);
12 assert_eq!(result, 4);
13 }
14 }
I then added a simple fn main:
$ cat -n src/main.rs
1 use exper_iai::add;
2
3 fn main() {
4 let r = add(3, 3);
5 assert_eq!(r, 6);
6 println!("{r}");
7 }
And the iai benchmark is:
$ cat -n benches/bench_iai.rs
1 use iai::black_box;
2 use exper_iai::add;
3
4 fn bench_iai_add() {
5 black_box(add(2, 2));
6 }
7
8 iai::main!(bench_iai_add,);
I also created gen_asm.sh so I could see the generated assembler.
$ cat -n asm/add.txt
1 .section .text.exper_iai::add,"ax",@progbits
2 .globl exper_iai::add
3 .p2align 4, 0x90
4 .type exper_iai::add,@function
5 exper_iai::add:
6
7 .cfi_startproc
8 lea rax, [rdi + rsi]
9 ret
10
11 .size exper_iai::add, .Lfunc_end0-exper_iai::add
$ cat -n asm/bench_iai_add.txt
1 .section .text.bench_iai::iai_wrappers::bench_iai_add,"ax",@progbits
2 .p2align 4, 0x90
3 .type bench_iai::iai_wrappers::bench_iai_add,@function
4 bench_iai::iai_wrappers::bench_iai_add:
5
6 .cfi_startproc
7 push rax
8 .cfi_def_cfa_offset 16
9
10 mov edi, 2
11 mov esi, 2
12 call qword ptr [rip + exper_iai::add@GOTPCREL]
13 mov qword ptr [rsp], rax
14
15 mov rax, qword ptr [rsp]
16
17 pop rax
18 .cfi_def_cfa_offset 8
19 ret
20
21 .size bench_iai::iai_wrappers::bench_iai_add, .Lfunc_end5-bench_iai::iai_wrappers::bench_iai_add
Also in Cargo.toml I added [profile.dev] and [profile.release] as well as I added rust-toolchain.toml to keep the toolchain consistent:
$ cat -n Cargo.toml
1 [package]
2 name = "exper_iai"
3 authors = [ "Wink Saville <[email protected]" ]
4 license = "MIT OR Apache-2.0"
5 version = "0.1.0"
6 edition = "2021"
7
8 # See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html
9
10 [dev-dependencies]
11 criterion = "0.4.0"
12 iai = "0.1.1"
13
14 [[bench]]
15 name = "bench_iai"
16 path = "benches/bench_iai.rs"
17 harness = false
18
19
20 [features]
21
22 # From: https://doc.rust-lang.org/cargo/reference/profiles.html#dev
23 [profile.dev]
24 opt-level = 0
25 debug = true
26 #split-debuginfo = '...' # Platform-specific.
27 debug-assertions = true
28 overflow-checks = true
29 lto = false
30 panic = 'unwind'
31 incremental = true
32 codegen-units = 256
33 rpath = false
34
35 # From: https://doc.rust-lang.org/cargo/reference/profiles.html#release
36 [profile.release]
37 opt-level = 3
38 debug = false
39 #split-debuginfo = '...' # Platform-specific.
40 debug-assertions = false
41 overflow-checks = false
42 lto = false
43 panic = 'unwind'
44 incremental = false
45 codegen-units = 1
46 rpath = false
$ cat -n rust-toolchain.toml
1 [toolchain]
2 channel = "stable"
3 #channel = "nightly"
Running main and test work as expected:
$ cargo run
Compiling exper_iai v0.1.0 (/home/wink/prgs/rust/myrepos/exper-iai)
Finished dev [unoptimized + debuginfo] target(s) in 0.33s
Running `target/debug/exper_iai`
6
$ cargo test
Compiling autocfg v1.1.0
Compiling proc-macro2 v1.0.47
...
Compiling tinytemplate v1.2.1
Compiling criterion v0.4.0
Compiling exper_iai v0.1.0 (/home/wink/prgs/rust/myrepos/exper-iai)
Finished test [unoptimized + debuginfo] target(s) in 8.58s
Running unittests src/lib.rs (target/debug/deps/exper_iai-854898c18c69642d)
running 1 test
test tests::it_works ... ok
test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
Running unittests src/main.rs (target/debug/deps/exper_iai-6092fd66897760dc)
running 0 tests
test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
Doc-tests exper_iai
running 0 tests
test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
And running cargo bench and yields a more or less expected result:
$ cargo clean
$ cargo bench
Compiling autocfg v1.1.0
Compiling proc-macro2 v1.0.47
...
Compiling tinytemplate v1.2.1
Compiling criterion v0.4.0
Finished bench [optimized] target(s) in 20.33s
Running unittests src/lib.rs (target/release/deps/exper_iai-e0c596df81667934)
running 1 test
test tests::it_works ... ignored
test result: ok. 0 passed; 0 failed; 1 ignored; 0 measured; 0 filtered out; finished in 0.00s
Running unittests src/main.rs (target/release/deps/exper_iai-bbf641b3842b4eea)
running 0 tests
test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
Running benches/bench_iai.rs (target/release/deps/bench_iai-e75a6910d1576500)
bench_iai_add
Instructions: 8
L1 Accesses: 12
L2 Accesses: 2
RAM Accesses: 2
Estimated Cycles: 92
And here are two more runs of just bench_iai showing the expected consistency:
$ cargo bench --bench bench_iai
Finished bench [optimized] target(s) in 0.02s
Running benches/bench_iai.rs (target/release/deps/bench_iai-e75a6910d1576500)
bench_iai_add
Instructions: 8 (No change)
L1 Accesses: 12 (No change)
L2 Accesses: 2 (No change)
RAM Accesses: 2 (No change)
Estimated Cycles: 92 (No change)
$ cargo bench --bench bench_iai
Finished bench [optimized] target(s) in 0.02s
Running benches/bench_iai.rs (target/release/deps/bench_iai-e75a6910d1576500)
bench_iai_add
Instructions: 8 (No change)
L1 Accesses: 12 (No change)
L2 Accesses: 2 (No change)
RAM Accesses: 2 (No change)
Estimated Cycles: 92 (No change)
Here is a my first unexpected result, if I change the command line, adding taskset -c 0, I wouldn't expect significantly different results, but Instructions is 0, that is unexpected:
$ taskset -c 0 cargo bench --bench bench_iai
Finished bench [optimized] target(s) in 0.02s
Running benches/bench_iai.rs (target/release/deps/bench_iai-e75a6910d1576500)
bench_iai_add
Instructions: 0 (-100.0000%)
L1 Accesses: 18446744073709551615 (+153722867280912908288%)
L2 Accesses: 2 (No change)
RAM Accesses: 3 (+50.00000%)
Estimated Cycles: 114 (+23.91304%)
$ taskset -c 0 cargo bench --bench bench_iai
Finished bench [optimized] target(s) in 0.02s
Running benches/bench_iai.rs (target/release/deps/bench_iai-e75a6910d1576500)
bench_iai_add
Instructions: 0 (No change)
L1 Accesses: 18446744073709551615 (No change)
L2 Accesses: 2 (No change)
RAM Accesses: 3 (No change)
Estimated Cycles: 114 (No change)
$ taskset -c 0 cargo bench --bench bench_iai
Finished bench [optimized] target(s) in 0.02s
Running benches/bench_iai.rs (target/release/deps/bench_iai-e75a6910d1576500)
bench_iai_add
Instructions: 0 (No change)
L1 Accesses: 18446744073709551615 (No change)
L2 Accesses: 2 (No change)
RAM Accesses: 3 (No change)
Estimated Cycles: 114 (No change)
But a bigger problem is if rename bench_iai.rs to iai.rs and the bench within that file from
bench_iai_add to iai_add:
$ cat -n benches/iai.rs
1 use iai::black_box;
2 use exper_iai::add;
3
4 fn iai_add() {
5 black_box(add(2, 2));
6 }
7
8 iai::main!(iai_add,);
And then I make the necessary changes to get things working again, see exper-iai branch rename-bench_iai_add-to-iai_add. In my opinion only "labels" have changed and none of the actual assembler code has changed.
But now I get really unexpected results, I switch branches and then clean and rerun bench and now Instructions has a value of 22:
$ git switch main
Switched to branch 'main'
Your branch is up to date with 'origin/main'.
$ git switch rename-bench_iai_add-to-iai-add
Switched to branch 'rename-bench_iai_add-to-iai-add'
$ cargo clean
$ cargo bench
Compiling autocfg v1.1.0
Compiling proc-macro2 v1.0.47
...
Compiling tinytemplate v1.2.1
Compiling criterion v0.4.0
Finished bench [optimized] target(s) in 20.60s
Running unittests src/lib.rs (target/release/deps/exper_iai-e0c596df81667934)
running 1 test
test tests::it_works ... ignored
test result: ok. 0 passed; 0 failed; 1 ignored; 0 measured; 0 filtered out; finished in 0.00s
Running unittests src/main.rs (target/release/deps/exper_iai-bbf641b3842b4eea)
running 0 tests
test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
Running benches/iai.rs (target/release/deps/iai-1d6df879cc9849e1)
iai_add
Instructions: 22
L1 Accesses: 34
L2 Accesses: 2
RAM Accesses: 2
Estimated Cycles: 114
$ cargo bench --bench iai
Finished bench [optimized] target(s) in 0.02s
Running benches/iai.rs (target/release/deps/iai-1d6df879cc9849e1)
iai_add
Instructions: 22 (No change)
L1 Accesses: 34 (No change)
L2 Accesses: 2 (No change)
RAM Accesses: 2 (No change)
Estimated Cycles: 114 (No change)
$ cargo bench --bench iai
Finished bench [optimized] target(s) in 0.02s
Running benches/iai.rs (target/release/deps/iai-1d6df879cc9849e1)
iai_add
Instructions: 22 (No change)
L1 Accesses: 34 (No change)
L2 Accesses: 2 (No change)
RAM Accesses: 2 (No change)
Estimated Cycles: 114 (No change)