criterion.rs
criterion.rs copied to clipboard
Huge variance between successive bench runs without code changes.
Hello : 👋
I'm benching a simple key value store and I have noticed huge variances between bench runs with no changes. Below is an output of two successive bench run that illustrates the variance:
Finished bench [optimized] target(s) in 8.67s
Running benches/db_benchmark.rs (target/release/deps/db_benchmark-bc743580edd94e51)
small_kv/put time: [9.8638 µs 11.195 µs 12.815 µs]
thrpt: [74.420 MiB/s 85.187 MiB/s 96.684 MiB/s]
change:
time: [+12.268% +26.990% +43.986%] (p = 0.00 < 0.05)
thrpt: [-30.549% -21.254% -10.927%]
Performance has regressed.
Found 9 outliers among 100 measurements (9.00%)
7 (7.00%) high mild
2 (2.00%) high severe
small_kv/get time: [195.29 µs 202.51 µs 210.08 µs]
thrpt: [4.5396 MiB/s 4.7093 MiB/s 4.8834 MiB/s]
change:
time: [+4234.8% +4648.6% +5188.6%] (p = 0.00 < 0.05)
thrpt: [-98.109% -97.894% -97.693%]
Performance has regressed.
Found 4 outliers among 100 measurements (4.00%)
2 (2.00%) high mild
2 (2.00%) high severe
First Run
Finished bench [optimized] target(s) in 0.33s
Running benches/db_benchmark.rs (target/release/deps/db_benchmark-bc743580edd94e51)
small_kv/put time: [6.6374 µs 6.9325 µs 7.3164 µs]
thrpt: [130.35 MiB/s 137.57 MiB/s 143.68 MiB/s]
change:
time: [-41.179% -33.312% -24.624%] (p = 0.00 < 0.05)
thrpt: [+32.669% +49.952% +70.007%]
Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
2 (2.00%) high mild
5 (5.00%) high severe
small_kv/get time: [99.470 µs 101.21 µs 103.21 µs]
thrpt: [9.2401 MiB/s 9.4230 MiB/s 9.5876 MiB/s]
change:
time: [-55.185% -52.739% -50.321%] (p = 0.00 < 0.05)
thrpt: [+101.29% +111.59% +123.14%]
Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
4 (4.00%) high mild
1 (1.00%) high severe
Second Run
I have noticed that the results are correlated to the number of iterations. With a lot of iterations (e.g 800K) the throughput reported is quite high than when the iterations a low (e.g 50K).
Is there a way to control the number of iterations in order to get comparable results?
Here is my benching code in case I'm doing something wrong:
pub fn small_kv_benchmark(c: &mut Criterion) {
let mut rng = rand::thread_rng();
let tmp_dir =
TempDir::new(&gen_string(&mut rng, 16)).expect("failed to create temp dir");
let mut db = GhalaDB::new(tmp_dir.path(), None).unwrap();
let mut data = (0usize..)
.map(|_| (gen_bytes(&mut rng, 36usize), gen_bytes(&mut rng, 1000usize)));
let mut group = c.benchmark_group("small_kv");
group.throughput(criterion::Throughput::Bytes(1000u64));
group.bench_function("put", |b| {
b.iter_batched(
|| data.next().unwrap(),
|(k, v)| db.put(k, v),
criterion::BatchSize::SmallInput,
)
});
let tmp_dir =
TempDir::new(&gen_string(&mut rng, 16)).expect("failed to create temp dir");
let mut db = GhalaDB::new(tmp_dir.path(), None).unwrap();
let mut keys = (0usize..1_000_000)
.map(|_| {
let (k, v) =
(gen_bytes(&mut rng, 36usize), gen_bytes(&mut rng, 1000usize));
db.put(k.clone(), v).ok();
k
})
.collect::<Vec<_>>();
keys.sort_unstable();
let mut keys = keys.into_iter();
group.bench_function("get", |b| {
b.iter_batched(
|| keys.next().unwrap_or_else(|| gen_bytes(&mut rng, 36usize)),
|k| db.get(&k),
criterion::BatchSize::SmallInput,
)
});
group.finish();
}
Could this be related to #735?