BenchmarkTools.jl
BenchmarkTools.jl copied to clipboard
Unrealistic values of same tests after repeated executions
The following simple tests behave differently in a second execution.
using BenchmarkTools
using LinearAlgebra
a = rand(300,300); u = triu(rand(300,300));
@btime u*a; # multiplication without exploiting upper triangular shape
@btime UpperTriangular(u)*a; # multiplication with exploitation of upper triangular shape
@btime lmul!(UpperTriangular(u),a); # in-place multiplication
@btime u*a; # multiplication without exploiting upper triangular shape
@btime UpperTriangular(u)*a; # multiplication with exploitation of upper triangular shape
@btime lmul!(UpperTriangular(u),a); # in-place multiplication
These are the results folowing a new start of Julia (other executions produce different figures):
julia> @btime u*a; # multiplication without exploiting upper triangular shape
281.900 μs (2 allocations: 703.17 KiB)
julia> @btime UpperTriangular(u)*a; # multiplication with exploitation of upper triangular shape
167.200 μs (3 allocations: 703.19 KiB)
julia> @btime lmul!(UpperTriangular(u),view(a,:,:)); # in-place multiplication
845.800 μs (5 allocations: 176 bytes)
julia> @btime u*a; # multiplication without exploiting upper triangular shape
3.432 ms (2 allocations: 703.17 KiB)
julia> @btime UpperTriangular(u)*a; # multiplication with exploitation of upper triangular shape
3.011 ms (3 allocations: 703.19 KiB)
julia> @btime lmul!(UpperTriangular(u),view(a,:,:)); # in-place multiplication
3.045 ms (5 allocations: 176 bytes)
Some values show a more than 10-times slow down. Is any explanation for this behaviour?
julia> versioninfo()
Julia Version 1.8.3
Commit 0434deb161 (2022-11-14 20:14 UTC)
Platform Info:
OS: Windows (x86_64-w64-mingw32)
CPU: 16 × Intel(R) Core(TM) i7-10700 CPU @ 2.90GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-13.0.1 (ORCJIT, skylake)
Threads: 1 on 16 virtual cores
Can you show the full result with @benchmark?
I performed repeatedly the following test:
using BenchmarkTools
using LinearAlgebra
a = rand(300,300); u = triu(rand(300,300));
@benchmark UpperTriangular(u)*a # multiplication with exploitation of upper triangular shape
@benchmark lmul!(UpperTriangular(u),a) # in-place multiplication
@benchmark UpperTriangular(u)*a # multiplication with exploitation of upper triangular shape
@benchmark lmul!(UpperTriangular(u),a) # in-place multiplication
and here are the results for one of the executions::

There are significant differences in the time evaluations for
@benchmark UpperTriangular(u)*a # multiplication with exploitation of upper triangular shape
So the only thing that comes to mind is that lmul! will modify u. So it might actually hit different code paths since you did do lmul! ~2k times?
u is not modified, but certainly a:
julia> using BenchmarkTools
julia> using LinearAlgebra
julia> a = rand(300,300); u = triu(rand(300,300));
julia> A = copy(a); U = copy(u);
julia> @benchmark lmul!(UpperTriangular(u),a) # in-place multiplication
BenchmarkTools.Trial: 5308 samples with 1 evaluation.
Range (min … max): 690.200 μs … 2.511 ms ┊ GC (min … max): 0.00% … 0.00%
Time (median): 946.750 μs ┊ GC (median): 0.00%
Time (mean ± σ): 937.349 μs ± 152.299 μs ┊ GC (mean ± σ): 0.00% ± 0.00%
Memory estimate: 16 bytes, allocs estimate: 1.
julia> norm(u-U)
0.0
julia> norm(a-A)
1.8061592552120448e106
I wonder what influence could this have on the time estimations.
Is any way to ask for a specified number of tests? In some runs, there are 10000 executions, in other only about 1500. What is the logic behind this choice? For smaller number of samples the execution times are usually larger!
julia> @benchmark lmul!(UpperTriangular(u),a) # in-place multiplication
BenchmarkTools.Trial: 1164 samples with 1 evaluation.
Range (min … max): 4.223 ms … 4.903 ms ┊ GC (min … max): 0.00% … 0.00%
Time (median): 4.278 ms ┊ GC (median): 0.00%
Time (mean ± σ): 4.295 ms ± 94.512 μs ┊ GC (mean ± σ): 0.00% ± 0.00%
BenchmarkTools takes has a time limit of about 5s and a upper number of samples it will take.
https://juliaci.github.io/BenchmarkTools.jl/stable/manual/#Benchmark-Parameters
I think this is related to mutation, so it should be fixed if you provide a setup for each benchmark and set evals=1
At least if we solve #24
Can you check it this still happens on the master branch? Our latest PR #318 should have fixed this