Add cat, vcat and hcat
Fixes #23 in case the offer from 2018 still stands :)
There where a few more edge cases than I had first thought and I hope the testing covers them all.
There are a couple of cases where it is not possible to return a Fill or Ones and then a normal Array is returned. Same goes for mixed types (e.g. vcat(Fill(0, 3), Zeros(4)) as I don't think there exist a feasible way to handle those cases.
Just close with prejudice if this is not wanted any more.
I guess this is kind of breaking as there might be code out there which (perhaps inadvertently) relies on concatenation returning an Array.
Codecov Report
Merging #140 (06724c6) into master (0606a8c) will increase coverage by
0.02%. The diff coverage is96.15%.
@@ Coverage Diff @@
## master #140 +/- ##
==========================================
+ Coverage 95.66% 95.69% +0.02%
==========================================
Files 4 5 +1
Lines 531 557 +26
==========================================
+ Hits 508 533 +25
- Misses 23 24 +1
| Impacted Files | Coverage Δ | |
|---|---|---|
| src/FillArrays.jl | 94.27% <ø> (ø) |
|
| src/fillcat.jl | 96.15% <96.15%> (ø) |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact),ø = not affected,? = missing dataPowered by Codecov. Last update 0606a8c...06724c6. Read the comment docs.
Won’t this not be type stable?
Won’t this not be type stable?
The fact that it might return either an Array or a Fill/Ones depending on shape or contents?
Yes, I suppose so (ugh for double negation :), it will not be type stable). Is there any other option besides giving up on the idea?
Maybe this was not your point, but performance better due to fewer allocations regardless of any additional type instability. code_warntype gives me basically idential output, but maybe I'm not using it at the right level.
If this is about surprising the end user then I agree it's iffy and maybe is a reason why it should not happen.
julia> @benchmark cat(Fill(1,1), Fill(1,1);dims=1)
BenchmarkTools.Trial:
memory estimate: 864 bytes
allocs estimate: 12
--------------
minimum time: 562.500 ns (0.00% GC)
median time: 656.522 ns (0.00% GC)
mean time: 893.306 ns (8.02% GC)
maximum time: 26.051 μs (97.09% GC)
--------------
samples: 10000
evals/sample: 184
julia> @benchmark cat(fill(1,1), fill(1,1);dims=1)
BenchmarkTools.Trial:
memory estimate: 704 bytes
allocs estimate: 20
--------------
minimum time: 1.090 μs (0.00% GC)
median time: 1.320 μs (0.00% GC)
mean time: 1.610 μs (2.24% GC)
maximum time: 363.690 μs (99.05% GC)
--------------
samples: 10000
evals/sample: 10
julia> @benchmark cat(Fill(1,10000), Fill(1,10000);dims=1)
BenchmarkTools.Trial:
memory estimate: 864 bytes
allocs estimate: 12
--------------
minimum time: 559.239 ns (0.00% GC)
median time: 630.978 ns (0.00% GC)
mean time: 861.916 ns (8.70% GC)
maximum time: 26.658 μs (97.32% GC)
--------------
samples: 10000
evals/sample: 184
julia> @benchmark cat(fill(1,10000), fill(1,10000);dims=1)
BenchmarkTools.Trial:
memory estimate: 313.14 KiB
allocs estimate: 23
--------------
minimum time: 16.200 μs (0.00% GC)
median time: 31.800 μs (0.00% GC)
mean time: 76.410 μs (15.06% GC)
maximum time: 3.978 ms (96.82% GC)
--------------
samples: 10000
evals/sample: 1
The issue with type-unstable calls is not a single call with large arrays, but that the performance penalty propagates to other calls, which can have substantial (100x or more) performance degradations when called with small arrays. So I'm very much against introducing type-unstable code.
The Ones and Zeros cases should be fine, though please add @inferred to the tests.
What's your use case? If we do https://github.com/JuliaArrays/FillArrays.jl/issues/104 then the value can be known at compile for more general arrays. Though that issue isn't completely thought through, e.g., Ones{BigFloat} cannot become SFill{one(BigFloat),BigFloat} since BigFloat is not a bitstype:
julia> struct Foo{T} end
julia> Foo{1.0}() # fine since `1.0` is a bits type
Foo{1.0}()
julia> Foo{big(1.0)}()
ERROR: TypeError: in Type, in parameter, expected Int64, got a value of type BigFloat
Stacktrace:
[1] top-level scope
@ REPL[3]:1
Hi,
As for my usecase, without making things too complicated; I'm using FillArrays to help keep memory utilization down, but some parts of my code concatenates arrays. I can easily work around in my code to just catch the Fills and give them special treatment, but then I saw #23 and thought I should just make a contribution instead as the problem looked simple on the surface.
I'm sorry if this is more a burden than a help. Feel free to close the issue if you don't want to waste more time on me. No hard feelings I promise :)
I also made the Ones and Zeros individual commits, so it should be possible to cherry pick them.
#104 would certainly help, but there is still the padding case when concatenating along multiple dimensions which means that only Zeros can be type stable for all allowed inputs to cat.
Here is an example of the "padding case" in case its not clear what I'm talking about:
julia> cat(ones(2), ones(2) ;dims=(1, 2))
4×2 Matrix{Float64}:
1.0 0.0
1.0 0.0
0.0 1.0
0.0 1.0
I guess problem is that I have 4 elements and I want 8. Base uses zero(T) to fill in the rest of the elements and I think it is important to be consistent with that.
There is a lower level method which dispatches on the dims and maybe that can be used, but I'm not sure if it will help and its difficult to test as cat does not seem to be type stable to begin with, see below.
Another way I reasoned about the type stability is that for mutable arrays, all flavours of cat will allocate so if you put cat in a hot loop I don't think that type instability is your worst problem.
I understand about the small arrays, and I did include them in the benchmark. I understand about the propagation and that is ofc harder to benchmark, but will it ever create worse problems than the allocations?
Here are benchmarks for small arrays with escaped input along with code_warntype. Note the difference between Fill (with capital F) and fill:
julia> @benchmark cat($(Fill(1,1)), $(Fill(1,1));dims=1)
BenchmarkTools.Trial:
memory estimate: 928 bytes
allocs estimate: 14
--------------
minimum time: 582.558 ns (0.00% GC)
median time: 697.674 ns (0.00% GC)
mean time: 937.771 ns (8.54% GC)
maximum time: 29.799 μs (94.53% GC)
--------------
samples: 10000
evals/sample: 172
julia> @benchmark cat($(fill(1,1)), $(fill(1,1));dims=1)
BenchmarkTools.Trial:
memory estimate: 512 bytes
allocs estimate: 18
--------------
minimum time: 969.231 ns (0.00% GC)
median time: 1.169 μs (0.00% GC)
mean time: 1.385 μs (1.92% GC)
maximum time: 268.692 μs (98.86% GC)
--------------
samples: 10000
evals/sample: 13
julia> @code_warntype cat(Fill(1,1), Fill(1,1);dims=1)
Variables
#unused#::Core.Const(Base.var"#cat##kw"())
@_2::NamedTuple{(:dims,), Tuple{Int64}}
@_3::Core.Const(cat)
A::Tuple{Fill{Int64, 1, Tuple{Base.OneTo{Int64}}}, Fill{Int64, 1, Tuple{Base.OneTo{Int64}}}}
dims::Int64
@_6::Int64
Body::Any
1 ─ %1 = Base.haskey(@_2, :dims)::Core.Const(true)
│ %1
│ (@_6 = Base.getindex(@_2, :dims))
└── goto #3
2 ─ Core.Const(:(Core.UndefKeywordError(:dims)))
└── Core.Const(:(@_6 = Core.throw(%5)))
3 ┄ (dims = @_6)
│ %8 = (:dims,)::Core.Const((:dims,))
│ %9 = Core.apply_type(Core.NamedTuple, %8)::Core.Const(NamedTuple{(:dims,), T} where T<:Tuple)
│ %10 = Base.structdiff(@_2, %9)::Core.Const(NamedTuple())
│ %11 = Base.pairs(%10)::Core.Const(Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}())
│ %12 = Base.isempty(%11)::Core.Const(true)
│ %12
└── goto #5
4 ─ Core.Const(:(Core.tuple(@_2, @_3)))
└── Core.Const(:(Core._apply_iterate(Base.iterate, Base.kwerr, %15, A)))
5 ┄ %17 = Core.tuple(dims, @_3)::Core.PartialStruct(Tuple{Int64, typeof(cat)}, Any[Int64, Core.Const(cat)])
│ %18 = Core._apply_iterate(Base.iterate, Base.:(var"#cat#131"), %17, A)::Any
└── return %18
julia> @code_warntype cat(fill(1,1), fill(1,1);dims=1)
Variables
#unused#::Core.Const(Base.var"#cat##kw"())
@_2::NamedTuple{(:dims,), Tuple{Int64}}
@_3::Core.Const(cat)
A::Tuple{Vector{Int64}, Vector{Int64}}
dims::Int64
@_6::Int64
Body::Any
1 ─ %1 = Base.haskey(@_2, :dims)::Core.Const(true)
│ %1
│ (@_6 = Base.getindex(@_2, :dims))
└── goto #3
2 ─ Core.Const(:(Core.UndefKeywordError(:dims)))
└── Core.Const(:(@_6 = Core.throw(%5)))
3 ┄ (dims = @_6)
│ %8 = (:dims,)::Core.Const((:dims,))
│ %9 = Core.apply_type(Core.NamedTuple, %8)::Core.Const(NamedTuple{(:dims,), T} where T<:Tuple)
│ %10 = Base.structdiff(@_2, %9)::Core.Const(NamedTuple())
│ %11 = Base.pairs(%10)::Core.Const(Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}())
│ %12 = Base.isempty(%11)::Core.Const(true)
│ %12
└── goto #5
4 ─ Core.Const(:(Core.tuple(@_2, @_3)))
└── Core.Const(:(Core._apply_iterate(Base.iterate, Base.kwerr, %15, A)))
5 ┄ %17 = Core.tuple(dims, @_3)::Core.PartialStruct(Tuple{Int64, typeof(cat)}, Any[Int64, Core.Const(cat)])
│ %18 = Core._apply_iterate(Base.iterate, Base.:(var"#cat#131"), %17, A)::Any
└── return %18
Maybe the type instability of cat for normal arrays is just a 1.6 bug?
The padding cat is pretty specialised so I'm more worried about the type stability of vcat(Fill(1,5), Fill(2,6)).
My recommendation is in your own code do something like:
_cat(a...; kwds...) = Base.cat(a...; kwds...)
_cat(a::AbstractFill...; kwds...) = # special case
Though you might be better off using LazyArrays.Vcat, etc., to always have a lazy concatenation.
Even this case does not appear to be any more type-unstable than the normal cat:
julia> A,B = Fill(1, 1), Fill(2,1)
@code_warntype cat(A,B; dims=1)
Variables
#unused#::Core.Const(Base.var"#cat##kw"())
@_2::NamedTuple{(:dims,), Tuple{Int64}}
@_3::Core.Const(cat)
A::Tuple{Fill{Int64, 1, Tuple{Base.OneTo{Int64}}}, Fill{Int64, 1, Tuple{Base.OneTo{Int64}}}}
dims::Int64
@_6::Int64
Body::Any
1 ─ %1 = Base.haskey(@_2, :dims)::Core.Const(true)
│ %1
│ (@_6 = Base.getindex(@_2, :dims))
└── goto #3
2 ─ Core.Const(:(Core.UndefKeywordError(:dims)))
└── Core.Const(:(@_6 = Core.throw(%5)))
3 ┄ (dims = @_6)
│ %8 = (:dims,)::Core.Const((:dims,))
│ %9 = Core.apply_type(Core.NamedTuple, %8)::Core.Const(NamedTuple{(:dims,), T} where T<:Tuple)
│ %10 = Base.structdiff(@_2, %9)::Core.Const(NamedTuple())
│ %11 = Base.pairs(%10)::Core.Const(Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}())
│ %12 = Base.isempty(%11)::Core.Const(true)
│ %12
└── goto #5
4 ─ Core.Const(:(Core.tuple(@_2, @_3)))
└── Core.Const(:(Core._apply_iterate(Base.iterate, Base.kwerr, %15, A)))
5 ┄ %17 = Core.tuple(dims, @_3)::Core.PartialStruct(Tuple{Int64, typeof(cat)}, Any[Int64, Core.Const(cat)])
│ %18 = Core._apply_iterate(Base.iterate, Base.:(var"#cat#131"), %17, A)::Any
└── return %18
julia> a,b = fill(1, 1), fill(2,1)
@code_warntype cat(a,b; dims=1)
Variables
#unused#::Core.Const(Base.var"#cat##kw"())
@_2::NamedTuple{(:dims,), Tuple{Int64}}
@_3::Core.Const(cat)
A::Tuple{Vector{Int64}, Vector{Int64}}
dims::Int64
@_6::Int64
Body::Any
1 ─ %1 = Base.haskey(@_2, :dims)::Core.Const(true)
│ %1
│ (@_6 = Base.getindex(@_2, :dims))
└── goto #3
2 ─ Core.Const(:(Core.UndefKeywordError(:dims)))
└── Core.Const(:(@_6 = Core.throw(%5)))
3 ┄ (dims = @_6)
│ %8 = (:dims,)::Core.Const((:dims,))
│ %9 = Core.apply_type(Core.NamedTuple, %8)::Core.Const(NamedTuple{(:dims,), T} where T<:Tuple)
│ %10 = Base.structdiff(@_2, %9)::Core.Const(NamedTuple())
│ %11 = Base.pairs(%10)::Core.Const(Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}())
│ %12 = Base.isempty(%11)::Core.Const(true)
│ %12
└── goto #5
4 ─ Core.Const(:(Core.tuple(@_2, @_3)))
└── Core.Const(:(Core._apply_iterate(Base.iterate, Base.kwerr, %15, A)))
5 ┄ %17 = Core.tuple(dims, @_3)::Core.PartialStruct(Tuple{Int64, typeof(cat)}, Any[Int64, Core.Const(cat)])
│ %18 = Core._apply_iterate(Base.iterate, Base.:(var"#cat#131"), %17, A)::Any
└── return %18
Or am I doing this wrong?
There is some performance overhead for that case though, but I assume that it is because current implementation does a bit of double work in that case:
julia> A,B = Fill(1, 1), Fill(2,1)
@benchmark cat($A,$B; dims=1)
BenchmarkTools.Trial:
memory estimate: 1.53 KiB
allocs estimate: 34
--------------
minimum time: 1.330 μs (0.00% GC)
median time: 1.610 μs (0.00% GC)
mean time: 2.016 μs (6.58% GC)
maximum time: 490.140 μs (98.96% GC)
--------------
samples: 10000
evals/sample: 10
julia> a,b = fill(1, 1), fill(2,1)
@benchmark cat($a,$b; dims=1)
BenchmarkTools.Trial:
memory estimate: 512 bytes
allocs estimate: 18
--------------
minimum time: 1.030 μs (0.00% GC)
median time: 1.260 μs (0.00% GC)
mean time: 1.466 μs (2.48% GC)
maximum time: 366.170 μs (99.18% GC)
--------------
samples: 10000
evals/sample: 10
I think that some of the double work can be worked away at the cost of duplicating some functionality in Base.
Anyways, your concerns as well as me being overall nervous about tampering with such a fundamental thing is enough to deter me. Sorry for the noise.
Can we leave this open? Someone else might be interested. In any case we should have the Ones and Zeros special cases
Sure, sorry for being too trigger happy.
If there are any changes you'd like me to do I'm happy to incorporate them.
Thanks for this @DrChainsaw , this would be super-helpful for me (I often have to vcat vectors of constant/static statistical weights).
general cat where values can be different can be achieved by LazyArrays.jl.
general cat where values can be different can be achieved by LazyArrays.jl.
Yes, I think the result would be very different from what @DrChainsaw is trying to achieve here.
I would very much prefer a type-stable vcat though, otherwise we may introduce type-instability in user applications in a way that's hard for them to escape (they'd need to add special handling for FillArray). I think specializing vcat and friends for Ones and Zeros would be very valuable though.
LazyArrays.Vcat is type stable. It's just not minimal
Thanks @putianyi889. I do indeed have use for the functionality LazyArrays offers since my usecase also includes normal arrays (although I agree it does something very different from what FillArrays could do).
However, I have tried to use RecursiveArrayTools.jl which seems to do a similar thing and the result was that I ended up with unfeasibly long compile times, so it seems that my use case would actually prefer a type-unstable version.
As for type-stable versions: I think the commit for Zeros from this PR can be used as is (except maybe it overloads a Base internal method), while Ones might need a separate version which dispatches directly on vcat/hcat due to this. Maybe this inconsistency is also enough reason to only do vcat and hcat for Zeros too. I think the best course of action is to just close this and open a new PR. I'm not sure if people are waiting for me to do this though...
I'm not sure if people are waiting for me to do this though...
Can't speak for other people, but I have actual statistics uses cases (merging sets of samples that all have weight one by construction) that would profit custom vcat on Ones.
if you need a minimal but type-unstable result, you could write a custom method to reduce/simplify an output.
I think the best course of action is to just close this and open a new PR.
agreed. At least cat of Ones/Zeros only can be type stable. General cases can be left for now.
At least cat of Ones/Zeros only can be type stable. General cases can be left for now.
I'd vote for that, then. :-)