StaticArrays.jl icon indicating copy to clipboard operation
StaticArrays.jl copied to clipboard

hcat/vcat on large arrays not returning/hanging/very slow

Open Arryk opened this issue 4 years ago • 5 comments

Hi, I'm having a troublesome issue with vector of SVector. When I try to convert an Vector{SVector{3,Float64} into a Nx3 Matrix for plotting, Julia doesn't return if the vector is too long.

It seems that SVector require a lots of memory allocation for the job and 99% most of the time is used in wasted compilation time. (See example below) Normal Vector of Vector doesn't have this issue. It is possible the issue has already been flagged I didn't find it.

Normal Vectors:

julia> v = [zeros(3) for i in 1:10_000]
10000-element Vector{Vector{Float64}}:
 [0.0, 0.0, 0.0]
 ⋮
 [0.0, 0.0, 0.0]

julia>  @time hcat(v...)
  0.000528 seconds (7 allocations: 391.000 KiB)
3×10000 Matrix{Float64}:
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  …  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0     0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0     0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0

Issue:

julia> using StaticArrays
julia> v = zeros(SVector{3,Float64}, 100)
julia> @time hcat(v...)

@time hcat(v...)
  5.367366 seconds (4.43 M allocations: 230.121 MiB, 0.62% gc time, 99.70% compilation time)
3×100 SMatrix{3, 100, Float64, 300} with indices SOneTo(3)×SOneTo(100):
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  …  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0     0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0     0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0

if size is higher than 1000 the function just doesn't return anything.

Arryk avatar Aug 24 '21 16:08 Arryk

I suggest using reduce(hcat, v):

julia> @time reduce(hcat, v)
  0.000096 seconds (2 allocations: 234.453 KiB)
3×10000 Matrix{Float64}:
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  …  0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0     0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0     0.0  0.0  0.0  0.0  0.0  0.0  0.0

Alternatively, https://github.com/mateuszbaran/HybridArrays.jl provides comparable performance without the need to convert.

mateuszbaran avatar Aug 24 '21 17:08 mateuszbaran

Thanks for the fast reply. That a good workaround.

I just thinking, this is not really a wanted behavior for quality of life. For example, if by mistake you call an hcat(v...) where v is large vector of SVector in a Pluto notebook on Windows. Then the whole workbook will have to be restarted and reactivity will work against you if you made that mistake.

Would it be feasible to have a simple automatic workaround built in? For example overriding hcat so that, when calling hcat(v...) it uses reduce(hcat,v) instead?

Arryk avatar Aug 25 '21 08:08 Arryk

It would be possible to introduce a cutoff, so that above a certain size hcat(v...) would return a Matrix instead of SMatrix which would solve the problem. I can give a few tips if someone would like to do this.

mateuszbaran avatar Aug 25 '21 08:08 mateuszbaran

I think that would be a good idea, that way we keep well simple code running in timely manner

Arryk avatar Aug 25 '21 08:08 Arryk

BTW, even for normal arrays there should probably be a number after which the splat calls the reduce(hcat, ...) method. Timing with xs::Tuple assuming you've already paid the splat penalty:

julia> xs = Tuple([rand(10) for _ in 1:10]); @btime hcat($xs...); @btime reduce(hcat, collect(xs));
  85.970 ns (1 allocation: 896 bytes)
  119.710 ns (2 allocations: 1.00 KiB)

julia> xs = Tuple([rand(10) for _ in 1:100]); @btime hcat($xs...); @btime reduce(hcat, collect(xs));
  2.421 μs (6 allocations: 11.25 KiB)
  723.595 ns (2 allocations: 8.81 KiB)

And timing the example above:

julia> v = [zeros(3) for i in 1:10_000];

julia> f(xs...) = reduce(hcat, collect(xs));

julia> @btime hcat($v...); @btime reduce(hcat, $v); @btime f($v...);
  155.834 μs (7 allocations: 390.97 KiB)
  53.250 μs (2 allocations: 234.42 KiB)  # without the splat
  92.166 μs (5 allocations: 390.84 KiB)  # first splat then reduce

mcabbott avatar Aug 25 '21 12:08 mcabbott