Argument order has a great impact on performance with maximum function
I noticed some very weird performance issues depending on the number and order of the arguments of a function. In the following minimal example the function test2 with additional unnecessary arguments in a specific order is appropriately fast, while the other two functions (without unnecessary arguments or with a different order of arguments) are unexpectedly slow.
However it's not the line x = maximum(a, u) that is slow. The for loop itself is slow and creates unexpected allocations, but only if x is the result of the maximum function.
using BenchmarkTools
function test()
a(u) = 0
u = zeros(10000)
println("No unnecessary arguments")
# @code_warntype test1(u, a)
display(@benchmark test1($u, $a))
println("a at position 9")
# @code_warntype test2(u, 0, 0, 0, 0, 0, 0, 0, a)
display(@benchmark test2($u, $0, $0, $0, $0, $0, $0, $0, $a))
println("a at position 8")
# @code_warntype test3(u, 0, 0, 0, 0, 0, 0, a, 0)
display(@benchmark test3($u, $0, $0, $0, $0, $0, $0, $a, $0))
return nothing
end
function test1(u, a)
x = maximum(a, u)
@inbounds for i in eachindex(u)
u[i] += x
end
end
function test2(u, t1, t2, t3, t4, t5, t6, t7, a)
x = maximum(a, u)
@inbounds for i in eachindex(u)
u[i] += x
end
end
function test3(u, t1, t2, t3, t4, t5, t6, a, t7)
x = maximum(a, u)
@inbounds for i in eachindex(u)
u[i] += x
end
end
julia> test()
No unnecessary arguments
BenchmarkTools.Trial:
memory estimate: 460.77 KiB
allocs estimate: 29489
--------------
minimum time: 400.200 μs (0.00% GC)
median time: 445.300 μs (0.00% GC)
mean time: 455.169 μs (1.87% GC)
maximum time: 1.579 ms (71.73% GC)
--------------
samples: 10000
evals/sample: 1
a at position 9
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 1.590 μs (0.00% GC)
median time: 1.630 μs (0.00% GC)
mean time: 1.630 μs (0.00% GC)
maximum time: 2.410 μs (0.00% GC)
--------------
samples: 10000
evals/sample: 10
a at position 8
BenchmarkTools.Trial:
memory estimate: 460.77 KiB
allocs estimate: 29489
--------------
minimum time: 427.100 μs (0.00% GC)
median time: 466.401 μs (0.00% GC)
mean time: 476.053 μs (1.98% GC)
maximum time: 1.632 ms (71.08% GC)
--------------
samples: 10000
evals/sample: 1
This output is generated using Julia 1.5.0-beta1 on Windows 10. I got similar results using Julia 1.4.2 and we also tested this on Linux.
I can reproduce this. The allocations go away for test1 when annotating x with x::Int = maximum(a, u). Code introspection (@code_warntype and @code_native) claims that even without type annotation the function is fully inferred and properly compiled.
I think this is https://github.com/JuliaLang/julia/issues/32834 (at least the misleading information from code introspection). This is fine:
function test1(u, a::T) where T
x = maximum(a, u)
@inbounds for i in eachindex(u)
u[i] += x
end
end