BenchmarkTools.jl icon indicating copy to clipboard operation
BenchmarkTools.jl copied to clipboard

Unrealistic values of same tests after repeated executions

Open andreasvarga opened this issue 2 years ago • 8 comments

The following simple tests behave differently in a second execution.

using BenchmarkTools
using LinearAlgebra
a = rand(300,300); u = triu(rand(300,300));
@btime u*a;      # multiplication without exploiting upper triangular shape
@btime UpperTriangular(u)*a;     # multiplication with exploitation of upper triangular shape
@btime lmul!(UpperTriangular(u),a);  # in-place multiplication
@btime u*a;      # multiplication without exploiting upper triangular shape
@btime UpperTriangular(u)*a;     # multiplication with exploitation of upper triangular shape
@btime lmul!(UpperTriangular(u),a);  # in-place multiplication

These are the results folowing a new start of Julia (other executions produce different figures):

julia> @btime u*a;      # multiplication without exploiting upper triangular shape
  281.900 μs (2 allocations: 703.17 KiB)

julia> @btime UpperTriangular(u)*a;     # multiplication with exploitation of upper triangular shape
  167.200 μs (3 allocations: 703.19 KiB)

julia> @btime lmul!(UpperTriangular(u),view(a,:,:));  # in-place multiplication
  845.800 μs (5 allocations: 176 bytes)

julia> @btime u*a;      # multiplication without exploiting upper triangular shape
  3.432 ms (2 allocations: 703.17 KiB)

julia> @btime UpperTriangular(u)*a;     # multiplication with exploitation of upper triangular shape
  3.011 ms (3 allocations: 703.19 KiB)

julia> @btime lmul!(UpperTriangular(u),view(a,:,:));  # in-place multiplication
  3.045 ms (5 allocations: 176 bytes)

Some values show a more than 10-times slow down. Is any explanation for this behaviour?

julia> versioninfo()
Julia Version 1.8.3
Commit 0434deb161 (2022-11-14 20:14 UTC)
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: 16 × Intel(R) Core(TM) i7-10700 CPU @ 2.90GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-13.0.1 (ORCJIT, skylake)
  Threads: 1 on 16 virtual cores

andreasvarga avatar May 01 '23 09:05 andreasvarga

Can you show the full result with @benchmark?

vchuravy avatar May 01 '23 12:05 vchuravy

I performed repeatedly the following test:

using BenchmarkTools
using LinearAlgebra
a = rand(300,300); u = triu(rand(300,300));
@benchmark UpperTriangular(u)*a     # multiplication with exploitation of upper triangular shape
@benchmark lmul!(UpperTriangular(u),a)  # in-place multiplication
@benchmark UpperTriangular(u)*a     # multiplication with exploitation of upper triangular shape
@benchmark lmul!(UpperTriangular(u),a)  # in-place multiplication

and here are the results for one of the executions::

Screenshot 2023-05-01 195522

There are significant differences in the time evaluations for @benchmark UpperTriangular(u)*a # multiplication with exploitation of upper triangular shape

andreasvarga avatar May 01 '23 18:05 andreasvarga

So the only thing that comes to mind is that lmul! will modify u. So it might actually hit different code paths since you did do lmul! ~2k times?

vchuravy avatar May 02 '23 16:05 vchuravy

u is not modified, but certainly a:

julia> using BenchmarkTools
julia> using LinearAlgebra
julia> a = rand(300,300); u = triu(rand(300,300));
julia> A = copy(a); U = copy(u);
julia> @benchmark lmul!(UpperTriangular(u),a)  # in-place multiplication
BenchmarkTools.Trial: 5308 samples with 1 evaluation.
 Range (min … max):  690.200 μs …   2.511 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     946.750 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   937.349 μs ± 152.299 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%
 Memory estimate: 16 bytes, allocs estimate: 1.

julia> norm(u-U)
0.0

julia> norm(a-A)
1.8061592552120448e106

I wonder what influence could this have on the time estimations.

Is any way to ask for a specified number of tests? In some runs, there are 10000 executions, in other only about 1500. What is the logic behind this choice? For smaller number of samples the execution times are usually larger!

julia> @benchmark lmul!(UpperTriangular(u),a)  # in-place multiplication
BenchmarkTools.Trial: 1164 samples with 1 evaluation.
 Range (min … max):  4.223 ms …  4.903 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     4.278 ms              ┊ GC (median):    0.00%
 Time  (mean ± σ):   4.295 ms ± 94.512 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

andreasvarga avatar May 03 '23 08:05 andreasvarga

BenchmarkTools takes has a time limit of about 5s and a upper number of samples it will take.

https://juliaci.github.io/BenchmarkTools.jl/stable/manual/#Benchmark-Parameters

vchuravy avatar May 03 '23 14:05 vchuravy

I think this is related to mutation, so it should be fixed if you provide a setup for each benchmark and set evals=1

gdalle avatar Jun 13 '23 15:06 gdalle

At least if we solve #24

gdalle avatar Jun 13 '23 15:06 gdalle

Can you check it this still happens on the master branch? Our latest PR #318 should have fixed this

gdalle avatar Jun 20 '23 10:06 gdalle