kotlinx-benchmark
kotlinx-benchmark copied to clipboard
Weird result for warmup vs actual iterations for native benchmarks
This is the benchmark class:
@State(Scope.Benchmark)
@OutputTimeUnit(BenchmarkTimeUnit.MILLISECONDS)
@Measurement(time = 1, timeUnit = BenchmarkTimeUnit.SECONDS)
@BenchmarkMode(Mode.Throughput)
class SimpleOps {
private var a = 0
private var b = 0
@Setup
fun setup() {
a = 10
b = 20
}
@Benchmark
fun add(): Int {
return a + b
}
@Benchmark
fun sub(): Int {
return a - b
}
@Benchmark
fun mul(): Int {
return a * b
}
@Benchmark
fun div(): Int {
return a / b
}
}
The results:
… benchmark.SimpleOps.add
Warm-up #0: 6,829.74 ops/ms
Warm-up #1: 7,094.42 ops/ms
Iteration #0: 89,446.7 ops/ms
Iteration #1: 92,966.6 ops/ms
Iteration #2: 83,1251.0 ops/ms
Success: ~ 88,513.1 ops/ms ±6.4%
… benchmark.SimpleOps.div
Warm-up #0: 6,988.84 ops/ms
Warm-up #1: 6,935.19 ops/ms
Iteration #0: 69,1761.0 ops/ms
Iteration #1: 69,718.8 ops/ms
Iteration #2: 82,222.2 ops/ms
Success: ~ 73,706.0 ops/ms ±11%
… benchmark.SimpleOps.mul
Warm-up #0: 5,537.26 ops/ms
Warm-up #1: 5,530.41 ops/ms
Iteration #0: 21,964.3 ops/ms
Iteration #1: 22,796.2 ops/ms
Iteration #2: 21,789.4 ops/ms
Success: ~ 22,183.3 ops/ms ±2.7%
… benchmark.SimpleOps.sub
Warm-up #0: 7,040.06 ops/ms
Warm-up #1: 7,041.18 ops/ms
Iteration #0: 91,877.2 ops/ms
Iteration #1: 78,932.6 ops/ms
Iteration #2: 90,539.2 ops/ms
Success: ~ 87,116.3 ops/ms ±9.2%
As you can see the actual iteration values is 10x of the warmup. It's probably due to this: https://github.com/Kotlin/kotlinx-benchmark/blob/master/runtime/nativeMain/src/kotlinx/benchmark/native/NativeExecutor.kt#L92 Before dividing should we change time to be the unit specified by the benchmark?
As a side note: JVM benchmarks is using JMH while the native one is using homemade. Would it be possible to provide a single implementation for all platforms?
@qurbonzoda can you tell me which one is the correct value?
I believe the difference is related to the fact that in warmup phase time is measured after each invocation, see code.
The warmup phase allows us to evaluate number of cycles per iterationTime
.
While in iteration phase information from warmup is used to invoke the benchmark method known number of times.
Hence, if getTimeNanos()
takes significant amount of time comparing to the benchmark method, warmup phase gives less accurate results. In brief, the quicker benchmark method is, the less accurate warmup becomes.
JVM benchmarks is using JMH while the native one is using homemade. Would it be possible to provide a single implementation for all platforms?
We try to use existing tools if they are of high quality. Having a single implementation for all platforms may not be a good idea as they run in different environments with specific characteristics. We plan to improve K/N benchmarking implementation in coming releases and this issue tracks that.