julia Improve docs for `vcat`, `cat`, etc.

This wants to:

Mention the fast methods for reduce(vcat, A) under reduce and under vcat, and state exactly what types they accept.
Mention stack from #43334 as an alternative where appropriate, including under eachslice and mapslices.
Provide better examples for vcat, hcat. They are about the same length, but try not to print out things like b = [6 7; 8 9; 10 11; 12 13; 14 15] too often, instead illustrating more uses, from simple to complex.
Break up the wall of text for cat(A...; dims), and avoid saying the word "stacked" there.

Aug 21 '22 17:08 mcabbott

Before we canonize reduce(vcat,itr) and friends as the proper way to do these things, should we consider whether this is actually something we want to do? I've always despised these specializations. They're difficult to discover (this doc PR helps) and they don't compose or behave like you'd hope, e.g. reduce(vcat ∘ vec,itr) or mapreduce(vec,vcat,itr) will totally miss the specializations and be slow as anything.

#43334 applies in a narrow subset of cases. #46003 would apply in some others. But really, we still lack a generic function for concatenation of iterables. cat provides the analog of min but we lack an analog of minimum. I closed #45563 awhile back but I don't think stack from #43334 really met my hopes on this matter.

Aug 23 '22 17:08 mikmoore

These specialisations probably cannot be removed, so at least they should be clearly documented, including the precise signature which triggers them. If I'm not mistaken you have to look at methods(reduce) to know this, at present.

I agree they are hard to discover, and fragile. stack covers only some uses. Maybe other cases are off-topic for this PR so I fold them here:

Similar code could certainly be written for an efficient eager flatten; in some sense that would be the reduce(vcat, v_v) to match stack's reduce(hcat, v_v), and has the precedent of Iterators.flatten to describe exactly what it does. Whether it's likely to be approved I don't know.

Many other rules are possible. If I understand right, https://github.com/JuliaLang/julia/pull/46003 proposes a rule with a kron-like combining of the axes of inner and outer arrays:

julia> A = [rand(2,3) for _ in 1:4, _ in 1:5];

julia> awfulstack(A) == hvcat(5, permutedims(A)...)
true

julia> using TensorCast

julia> awfulstack(A) == @cast out[(i,I), (j,J)] := A[I,J][i,j]
true

julia> stack(A) == @cast out[i,j,I,J] := A[I,J][i,j]
true

julia> awfulstack(A) == @cast out[(i,I), (j,J)] := stack(A)[i,j,I,J]
true

On a uniform-size array of arrays like A, this is (like flatten) some reshape & permutedims of stack. There's also some extension to non-uniform arrays which I don't quite see.

Aug 23 '22 18:08 mcabbott

Technically, removing these performance shortcuts would be non-breaking since the fallback is still operable. That said, reversion to the fallback would impact performance very negatively and for that reason I imagine we'd keep them for a long time anyway.

I expect that, if we do ever manage a more generic function, these specialization could still exist (for legacy reasons) although be replaced by forwarding to that function. So, with that in mind and with the lack of an in-progress PR to create the aforementioned generalization, I'll retract my reservations about documenting them.

Aug 23 '22 20:08 mikmoore

Besides performance there are quirks like this which would probably break things:

julia> mapreduce(identity, hcat, [[1,2]])  # never calls hcat
2-element Vector{Int64}:
 1
 2

julia> reduce(hcat, [[1,2]])
2×1 Matrix{Int64}:
 1
 2

Perhaps we could make reduce(hcat, ::AbstractVector{<:AbstractVector}) call stack now, but there's not much gain, and maybe there are other weird edge cases.

Aug 23 '22 20:08 mcabbott

Bump... pinging @ViralBShah as he liked it in August?

Oct 29 '22 02:10 mcabbott

Thanks. I will come around to this soon. In general, I feel anything that is an improvement and is correct in a docs PR should be merged.

Oct 29 '22 16:10 ViralBShah