StaticArrays.jl icon indicating copy to clipboard operation
StaticArrays.jl copied to clipboard

floor, ceil, round called with (::Type{T}, ::SArray) allocates

Open jaemolihm opened this issue 3 years ago • 8 comments

using BenchmarkTools, StaticArrays
x = SVector(0., 1.)
@btime round.(Int, $x) # 11.817 ns (1 allocation: 16 bytes)
@btime floor.(Int, $x) # 11.723 ns (1 allocation: 16 bytes)
@btime ceil.(Int, $x) # 11.754 ns (1 allocation: 16 bytes)

There seems to be no type instability.

julia> @code_warntype round.(Int, x)
MethodInstance for (::var"##dotfunction#439#69")(::Type{Int64}, ::SVector{2, Float64})
  from (::var"##dotfunction#439#69")(x1, x2) in Main
Arguments
  #self#::Core.Const(var"##dotfunction#439#69"())
  x1::Core.Const(Int64)
  x2::SVector{2, Float64}
Body::SVector{2, Int64}
1 ─ %1 = Base.broadcasted(Main.round, x1, x2)::Base.Broadcast.Broadcasted{StaticArrays.StaticArrayStyle{1}, Nothing, typeof(round), Tuple{Base.RefValue{Type{Int64}}, SVector{2, Float64}}}
│   %2 = Base.materialize(%1)::SVector{2, Int64}
└──      return %2
julia> versioninfo()
Julia Version 1.7.0
Commit 3bf9d17731 (2021-11-30 12:12 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-12.0.1 (ORCJIT, haswell)
Environment:
  JULIA = /home/jmlim/appl/julia-1.7.0/bin/julia

(@v1.7) pkg> status StaticArrays
      Status `~/.julia/environments/v1.7/Project.toml`
  [90137ffa] StaticArrays v1.3.4

jaemolihm avatar Feb 10 '22 08:02 jaemolihm

Probably just due to benchmarking broadcast in global scope?

julia> x = SVector(0., 1.);

julia> function f(x)
           return round.(Int, x)
       end;

julia> @btime f($(Ref(x))[])
  3.679 ns (0 allocations: 0 bytes)
2-element SVector{2, Int64} with indices SOneTo(2):
 0
 1

fredrikekre avatar Feb 10 '22 09:02 fredrikekre

Is that also a concern? Never heard of that one before.

DNF2 avatar Feb 10 '22 09:02 DNF2

Is the following still considered global scope? (Sorry, I am not very familiar with the Julia vocabulary..)

function test()
    x = SVector(0., 1.)
    @btime round.(Int, $x)
end
test()
# 11.734 ns (1 allocation: 16 bytes)

jaemolihm avatar Feb 10 '22 09:02 jaemolihm

I believe that is more or less equivalent, since @btime runs in the global scope.

fredrikekre avatar Feb 10 '22 09:02 fredrikekre

Ok, thanks a lot. My actual problem looked more like below. Why does run0 allocate per iteration but run1 does not?

julia> module A
           using StaticArrays
           f(x) = round.(Int, x)
           function run0(N, x)
               y = zero(x)
               for i in 1:N
                   y += round.(Int, x)
               end
           end
           function run1(N, x)
               y = zero(x)
               for i in 1:N
                   y += f(x)
               end
           end
       end

julia> @time A.run0(100000, SVector(1., 2.))
  0.002083 seconds (100.00 k allocations: 1.526 MiB)

julia> @time A.run1(100000, SVector(1., 2.))
  0.000001 seconds

jaemolihm avatar Feb 10 '22 09:02 jaemolihm

That's a Julia broadcasting thing. This doesn't allocate:

julia> function run2(N, x)
           y = zero(x)
           for i in 1:N
               y += map(z -> round(Int, z), x)
           end
       end

julia> @btime run2(100000, SVector(1., 2.))
  6.359 ns (0 allocations: 0 bytes)

Broadcasting machinery wraps types in Ref which allocates:

julia> Broadcast.broadcastable(Int)
Base.RefValue{Type{Int64}}(Int64)

So there is nothing that can be done about it in StaticArrays.jl.

mateuszbaran avatar Feb 10 '22 09:02 mateuszbaran

This is a regression in 1.7, at least here in my machine:

In 1.6.3:

% ~/programs/julia/julia-1.6.3/bin/julia 
               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.6.3 (2021-09-23)
 _/ |\__'_|_|_|\__'_|  |  Official https://julialang.org/ release
|__/                   |

julia> f(x) = round.(Int,x)
f (generic function with 1 method)

julia> using StaticArrays

julia> x = rand(SVector{3,Float64})
3-element SVector{3, Float64} with indices SOneTo(3):
 0.41471376346996136
 0.9337726239371802
 0.10360192859349593

julia> @btime f($x)
  0.020 ns (0 allocations: 0 bytes)
3-element SVector{3, Int64} with indices SOneTo(3):
 0
 1
 0


In 1.7.2:

% julia
               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.7.2 (2022-02-06)
 _/ |\__'_|_|_|\__'_|  |  Official https://julialang.org/ release
|__/                   |

julia> f(x) = round.(Int,x)
f (generic function with 1 method)

julia> using StaticArrays

julia> x = rand(SVector{3,Float64})
3-element SVector{3, Float64} with indices SOneTo(3):
 0.3033270425954926
 0.6372267959474416
 0.2737497769441821

julia> @btime f($x)
  11.830 ns (1 allocation: 16 bytes)
3-element SVector{3, Int64} with indices SOneTo(3):
 0
 1
 0

lmiq avatar Feb 10 '22 11:02 lmiq

Note that BenchmarkTools changed, and in the first example above it was constant-propagating the value, but not in the second example. That is the explanation for the different timings for tuples, mentined in the (closed) issue above.

However, that does not explain the allocation in 1.7.2 when using StaticArrays. Now with BenchmarkTools 1.3.0 in all settings, I get:

In 1.6.3:

julia> versioninfo()
Julia Version 1.6.3
Commit ae8452a9e0 (2021-09-23 17:34 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Core(TM) i7-8550U CPU @ 1.80GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-11.0.1 (ORCJIT, skylake)

julia> x = rand(SVector{3,Float64});

julia> @btime f($x)
  15.818 ns (0 allocations: 0 bytes)
3-element SVector{3, Int64} with indices SOneTo(3):
 1
 0
 0

In 1.7.2:

julia> versioninfo()
Julia Version 1.7.2
Commit bf53498635 (2022-02-06 15:21 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Core(TM) i7-8550U CPU @ 1.80GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-12.0.1 (ORCJIT, skylake)

julia> f(x) = round.(Int,x)
f (generic function with 1 method)

julia> using StaticArrays

julia> x = rand(SVector{3,Float64});

julia> @btime f($x)
  12.267 ns (1 allocation: 16 bytes)
3-element SVector{3, Int64} with indices SOneTo(3):
 0
 1
 1

Since static arrays are "just tuples", there is a regression associated to the package here.

edit: I've tested versions 1.2.13 and 1.3.4 of StaticArrays in both Julia 1.6.3 and 1.7.2 and the issue remains in all settings. Thus, it is not a regression introduced by StaticArrays only, it is something that is correlated to some Julia core change.

edit2: the allocation occurs for static vectors of length 3, but not with those with length 2.

lmiq avatar Feb 10 '22 12:02 lmiq