kotlinx-benchmark icon indicating copy to clipboard operation
kotlinx-benchmark copied to clipboard

Weird result for warmup vs actual iterations for native benchmarks

Open raniejade opened this issue 4 years ago • 3 comments

This is the benchmark class:

@State(Scope.Benchmark)
@OutputTimeUnit(BenchmarkTimeUnit.MILLISECONDS)
@Measurement(time = 1, timeUnit = BenchmarkTimeUnit.SECONDS)
@BenchmarkMode(Mode.Throughput)
class SimpleOps {
  private var a = 0
  private var b = 0

  @Setup
  fun setup() {
    a = 10
    b = 20
  }

  @Benchmark
  fun add(): Int {
    return a + b
  }

  @Benchmark
  fun sub(): Int {
    return a - b
  }

  @Benchmark
  fun mul(): Int {
    return a * b
  }

  @Benchmark
  fun div(): Int {
    return a / b
  }
}

The results:

… benchmark.SimpleOps.add
Warm-up #0: 6,829.74 ops/ms
Warm-up #1: 7,094.42 ops/ms
Iteration #0: 89,446.7 ops/ms
Iteration #1: 92,966.6 ops/ms
Iteration #2: 83,1251.0 ops/ms
  Success:   ~ 88,513.1 ops/ms ±6.4%

… benchmark.SimpleOps.div
Warm-up #0: 6,988.84 ops/ms
Warm-up #1: 6,935.19 ops/ms
Iteration #0: 69,1761.0 ops/ms
Iteration #1: 69,718.8 ops/ms
Iteration #2: 82,222.2 ops/ms
  Success:   ~ 73,706.0 ops/ms ±11%

… benchmark.SimpleOps.mul
Warm-up #0: 5,537.26 ops/ms
Warm-up #1: 5,530.41 ops/ms
Iteration #0: 21,964.3 ops/ms
Iteration #1: 22,796.2 ops/ms
Iteration #2: 21,789.4 ops/ms
  Success:   ~ 22,183.3 ops/ms ±2.7%

… benchmark.SimpleOps.sub
Warm-up #0: 7,040.06 ops/ms
Warm-up #1: 7,041.18 ops/ms
Iteration #0: 91,877.2 ops/ms
Iteration #1: 78,932.6 ops/ms
Iteration #2: 90,539.2 ops/ms
  Success:   ~ 87,116.3 ops/ms ±9.2%

As you can see the actual iteration values is 10x of the warmup. It's probably due to this: https://github.com/Kotlin/kotlinx-benchmark/blob/master/runtime/nativeMain/src/kotlinx/benchmark/native/NativeExecutor.kt#L92 Before dividing should we change time to be the unit specified by the benchmark?

As a side note: JVM benchmarks is using JMH while the native one is using homemade. Would it be possible to provide a single implementation for all platforms?

raniejade avatar Mar 11 '20 09:03 raniejade

@qurbonzoda can you tell me which one is the correct value?

raniejade avatar Mar 13 '20 13:03 raniejade

I believe the difference is related to the fact that in warmup phase time is measured after each invocation, see code. The warmup phase allows us to evaluate number of cycles per iterationTime. While in iteration phase information from warmup is used to invoke the benchmark method known number of times.

Hence, if getTimeNanos() takes significant amount of time comparing to the benchmark method, warmup phase gives less accurate results. In brief, the quicker benchmark method is, the less accurate warmup becomes.

qurbonzoda avatar Aug 22 '20 02:08 qurbonzoda

JVM benchmarks is using JMH while the native one is using homemade. Would it be possible to provide a single implementation for all platforms?

We try to use existing tools if they are of high quality. Having a single implementation for all platforms may not be a good idea as they run in different environments with specific characteristics. We plan to improve K/N benchmarking implementation in coming releases and this issue tracks that.

qurbonzoda avatar Aug 22 '20 02:08 qurbonzoda