julia Argument order has a great impact on performance with maximum function

I noticed some very weird performance issues depending on the number and order of the arguments of a function. In the following minimal example the function test2 with additional unnecessary arguments in a specific order is appropriately fast, while the other two functions (without unnecessary arguments or with a different order of arguments) are unexpectedly slow.

However it's not the line x = maximum(a, u) that is slow. The for loop itself is slow and creates unexpected allocations, but only if x is the result of the maximum function.

using BenchmarkTools


function test()
    a(u) = 0
    u = zeros(10000)

    println("No unnecessary arguments")
    # @code_warntype test1(u, a)
    display(@benchmark test1($u, $a))

    println("a at position 9")
    # @code_warntype test2(u, 0, 0, 0, 0, 0, 0, 0, a)
    display(@benchmark test2($u, $0, $0, $0, $0, $0, $0, $0, $a))

    println("a at position 8")
    # @code_warntype test3(u, 0, 0, 0, 0, 0, 0, a, 0)
    display(@benchmark test3($u, $0, $0, $0, $0, $0, $0, $a, $0))

    return nothing
end

function test1(u, a)
    x = maximum(a, u)

    @inbounds for i in eachindex(u)
        u[i] += x
    end
end

function test2(u, t1, t2, t3, t4, t5, t6, t7, a)
    x = maximum(a, u)

    @inbounds for i in eachindex(u)
        u[i] += x
    end
end

function test3(u, t1, t2, t3, t4, t5, t6, a, t7)
    x = maximum(a, u)

    @inbounds for i in eachindex(u)
        u[i] += x
    end
end

julia> test()
No unnecessary arguments
BenchmarkTools.Trial: 
  memory estimate:  460.77 KiB
  allocs estimate:  29489
  --------------
  minimum time:     400.200 μs (0.00% GC)
  median time:      445.300 μs (0.00% GC)
  mean time:        455.169 μs (1.87% GC)
  maximum time:     1.579 ms (71.73% GC) 
  --------------
  samples:          10000
  evals/sample:     1    

a at position 9
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     1.590 μs (0.00% GC)
  median time:      1.630 μs (0.00% GC)
  mean time:        1.630 μs (0.00% GC)
  maximum time:     2.410 μs (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     10

a at position 8
BenchmarkTools.Trial: 
  memory estimate:  460.77 KiB
  allocs estimate:  29489
  --------------
  minimum time:     427.100 μs (0.00% GC)
  median time:      466.401 μs (0.00% GC)
  mean time:        476.053 μs (1.98% GC)
  maximum time:     1.632 ms (71.08% GC)
  --------------
  samples:          10000
  evals/sample:     1

This output is generated using Julia 1.5.0-beta1 on Windows 10. I got similar results using Julia 1.4.2 and we also tested this on Linux.

Jun 16 '20 12:06 efaulhaber

I can reproduce this. The allocations go away for test1 when annotating x with x::Int = maximum(a, u). Code introspection (@code_warntype and @code_native) claims that even without type annotation the function is fully inferred and properly compiled.

Jun 16 '20 13:06 jakobnissen

I think this is https://github.com/JuliaLang/julia/issues/32834 (at least the misleading information from code introspection). This is fine:

function test1(u, a::T) where T
    x = maximum(a, u)

    @inbounds for i in eachindex(u)
        u[i] += x
    end
end

Jun 16 '20 14:06 thofma