hcat/vcat on large arrays not returning/hanging/very slow
Hi, I'm having a troublesome issue with vector of SVector. When I try to convert an Vector{SVector{3,Float64} into a Nx3 Matrix for plotting, Julia doesn't return if the vector is too long.
It seems that SVector require a lots of memory allocation for the job and 99% most of the time is used in wasted compilation time. (See example below) Normal Vector of Vector doesn't have this issue. It is possible the issue has already been flagged I didn't find it.
Normal Vectors:
julia> v = [zeros(3) for i in 1:10_000]
10000-element Vector{Vector{Float64}}:
[0.0, 0.0, 0.0]
⋮
[0.0, 0.0, 0.0]
julia> @time hcat(v...)
0.000528 seconds (7 allocations: 391.000 KiB)
3×10000 Matrix{Float64}:
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 … 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
Issue:
julia> using StaticArrays
julia> v = zeros(SVector{3,Float64}, 100)
julia> @time hcat(v...)
@time hcat(v...)
5.367366 seconds (4.43 M allocations: 230.121 MiB, 0.62% gc time, 99.70% compilation time)
3×100 SMatrix{3, 100, Float64, 300} with indices SOneTo(3)×SOneTo(100):
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 … 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
if size is higher than 1000 the function just doesn't return anything.
I suggest using reduce(hcat, v):
julia> @time reduce(hcat, v)
0.000096 seconds (2 allocations: 234.453 KiB)
3×10000 Matrix{Float64}:
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 … 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
Alternatively, https://github.com/mateuszbaran/HybridArrays.jl provides comparable performance without the need to convert.
Thanks for the fast reply. That a good workaround.
I just thinking, this is not really a wanted behavior for quality of life. For example, if by mistake you call an hcat(v...) where v is large vector of SVector in a Pluto notebook on Windows. Then the whole workbook will have to be restarted and reactivity will work against you if you made that mistake.
Would it be feasible to have a simple automatic workaround built in? For example overriding hcat so that, when calling hcat(v...) it uses reduce(hcat,v) instead?
It would be possible to introduce a cutoff, so that above a certain size hcat(v...) would return a Matrix instead of SMatrix which would solve the problem. I can give a few tips if someone would like to do this.
I think that would be a good idea, that way we keep well simple code running in timely manner
BTW, even for normal arrays there should probably be a number after which the splat calls the reduce(hcat, ...) method. Timing with xs::Tuple assuming you've already paid the splat penalty:
julia> xs = Tuple([rand(10) for _ in 1:10]); @btime hcat($xs...); @btime reduce(hcat, collect(xs));
85.970 ns (1 allocation: 896 bytes)
119.710 ns (2 allocations: 1.00 KiB)
julia> xs = Tuple([rand(10) for _ in 1:100]); @btime hcat($xs...); @btime reduce(hcat, collect(xs));
2.421 μs (6 allocations: 11.25 KiB)
723.595 ns (2 allocations: 8.81 KiB)
And timing the example above:
julia> v = [zeros(3) for i in 1:10_000];
julia> f(xs...) = reduce(hcat, collect(xs));
julia> @btime hcat($v...); @btime reduce(hcat, $v); @btime f($v...);
155.834 μs (7 allocations: 390.97 KiB)
53.250 μs (2 allocations: 234.42 KiB) # without the splat
92.166 μs (5 allocations: 390.84 KiB) # first splat then reduce