julia Add `stack(array_of

This adds a ~~simple~~ version of the function stack described in #21672. It generalises reduce(hcat, vector_of_vectors) to handle more dimensions, and to handle iterators efficiently.

~~For iterators, like #31644 this handles only cases for which repeated append! is the fallback, except that it will also widen the type if necessary. You would need quite a different implementation to handle things like reduce(vcat, ([1 i i^2] for i in 1:10 if i>pi)), probably collecting first to know where to write the first slice. Because of that, it does not take a dims keyword.~~ Thus it can be the inverse of eachcol(::Matrix) or eachslice(::Array{T,3}; dims=3) ~~but not of eachrow.~~

Edit: The current implementation introduced in this message below takes a dims keyword. With this, stack(eachslice(A; dims); dims) == A. Without the dims keyword, the dimensions of the slice always come first, thus it should obey vec(stack(xs)) == collect(Iterators.flatten(xs)). Iterators of unknown length are collected first, as this seems quicker than falling back to append!-and-reshape.

Handling iterators also makes it trivial to allow stack(f, xs) = stack(f(x) for x in xs), as suggested below. This seems nice to have with a do block.

~~It does not currently handle the empty case, as I'm not sure what it should do.~~ Ideally stack(empty!([1:5])) would make a 5×0 Matrix, but that's impossible. At present it makes a 1×0 Matrix. The empty handling might be more complicated than ideal, see below.

Should work fine with OffsetArrays, it preserves all axes. Should mostly work with CUDA.

The name unfortunately clashes with DataFrames.stack. Other names suggested in #21672 include pack or nested; perhaps seeing it as the inverse of eachslice might suggest others?

Dec 04 '21 17:12 mcabbott

As I stated in https://github.com/JuliaLang/julia/issues/21672, a concern is that stack([1,2],[3,4]) could reasonably mean [1,2,3,4] as well as [1 3;2 4]. pack actually suggests the first interpretation to me; same for flatten. nested feels like the opposite operation to stack. unnest is ugly but might convey the intention better? Or matrixify? Matrix(::AbstractVector) is free, but it would feel weird if the result was not a Matrix? Not saying that it should block the PR (it's a useful functionality that deserves a name, even if that name is not unambiguous), but worth thinking over in case there's a better alternative.

Dec 04 '21 19:12 antoine-levitt

Re names, first English: If you want to "stack" some papers, or some hay, then you don't put the pieces together on their long dimensions as that would fall over. Maybe it becomes less clear if you stack textbooks. Whereas "concatenate" seems to come from liturgy, to mean what it does for strings, or architecture, like hcat of matrices. To me what "pack" suggest is that it fills the output completely, unlike e.g. cat(1,2,3; dims=(1,2)); maybe it leans towards new-axes?

Python's numpy.stack has this meaning, and explicitly distinguishes this "new axes" vs. cat's "same axes". In Mathematica there is no such operation, multi-dimensional arrays are lists of lists. It doesn't look like Matlab has such a function (separate from cat). I guess hcat([1,2], [3,4]) makes most sense if you think of these as really being 2×1 matrices.

In Julia, if this is the inverse of eachslice, it could be unslice? I think this perspective might make what it does clearer: Nobody expects eachcol to cut its input mid-way along a column, the dimensions are partitioned into those of the elements and that of the container. JuliennedArrays calls this Align, but perhaps that makes more sense for a view. There is also Iterators.flatten which always collects to a vector.

My objection to matrixify is that it ties this to 2 dimensions, which is where you need it least. It can't be a method of Array since [1:2, 3:4] isa Array, and it may make a CuArray. But something like dense (it makes a DenseArray, probably), or "solid" or "continguous"?

Dec 04 '21 21:12 mcabbott

Could we also add a two-arg version with a function argument? (Could just be implemented as stack(Iterators.map(f, x)).) I find myself using the mapreduce(f, vcat, x) pattern quite often, but unfortunately that's currently really slow.

Dec 04 '21 22:12 simeonschaub

Maybe concat (or even concatenate) would be in good analogy to e.g. max and maximum, as maximum(x) = max(x...). Here, concat(v, dims=dims) = cat(v..., dims=dims), and dims=ndims(first(v))+1 could be the default if no dims is present.

Dec 04 '21 22:12 jonas-schulze

I'm sure I've suggested this max/maximum analogy too somewhere. Arguments against it, and in favour of a new verb, are:

One, if M is a matrix, this is the inverse of eachcol in that stack(eachslice(M, dims=2)) == M. ~~If a future version took~~ It takes a dims keyword, ~~then a plausible meaning would be~~ which allows stack(eachrow(M); dims=1) == M. That can be useful (it's what Flux.stack does). But it's not the meaning of vcat, which would put the vectors of eachrow end-to-end.
Two, the column-major append! behaviour very naturally handles more dimensions, including container dimensions. ~~When #32310 is eventually merged~~ Now, eachslice(A4; dims=(3,4)) ~~will be~~ is a matrix of matrices, which stack puts back together. This also doesn't fit, cat(mats...; dims=(3,4)) instead some kind of block-diagonal arrangement.
[Edit] Three, this implementation traverses non-array elements, stack(((1,2), (x=3, y=4))) is quite unlike hcat((1,2), (x=3, y=4)) or any other cat function. I think that a function with a name closely connected to cat should not differ on this. And that this is a nice bonus, rather than an essential feature.
[Edit] Related to one, I think there's clarity in the idea that cat always extends existing axes (possibly trivial, size([1,2,3],4) == 1) while stack always places arrays along new axes. This is the same distinction made in Python, mentioned above. It is also why stack has obvious rules for OffsetArrays.

[Edit] Since this thread is now very long, I've enlarged this slightly. There are examples below where the point was raised again.

Dec 04 '21 23:12 mcabbott

There is also PackedVectorsOfVectors.pack

Dec 05 '21 06:12 longemen3000

Ref https://github.com/JuliaLang/julia/pull/16708

Dec 10 '21 21:12 mschauer

Thanks, I hadn't seen that iterator approach. Perhaps something like that could now be written as mixture of Iterators.flatten (same iteration) and Iterators.product (same shape, given the first element).

But can it be fast? It seems that flatten is a lot slower than reduce(vcat, xs):

julia> xs = [rand(128) for _ in 1:100];

julia> v = @btime collect(Iterators.flatten($xs));
  min 55.750 μs, mean 64.555 μs (9 allocations, 326.55 KiB)

julia> v ≈ @btime collect(reduce(vcat, $xs))
  min 4.542 μs, mean 13.543 μs (5 allocations, 200.97 KiB)
true

Dec 11 '21 01:12 mcabbott

Yes, it is a tricky thing, that's perhaps why it didn't went anywhere. On the other hand

julia> using StaticArrays

julia> xs = [@SVector rand(3) for _ in 1:100];

julia> v = @btime collect(Iterators.flatten($xs));
  592.251 ns (1 allocation: 2.50 KiB)

julia> v ≈ @btime collect(reduce(vcat, $xs))
  861.942 ns (3 allocations: 5.88 KiB)

so reduce(hcat,...) isn't the last word either...

Dec 11 '21 08:12 mschauer

Indeed, that's pretty fast. (Notice we're penalising reduce(hcat, xs) by collecting twice, my mistake.)

The analogy to Iterators.flatten is perhaps a useful way to think about this. If it collects an iterator of iterators, then the slices need not be arrays. Which means that something like this can avoid allocating them, for an efficient small-slice mapslices:

stack(xs, eachcol(Y)) do x, y
    x/y[1], y[2]/y[3], y[4]/x
end

Allowing this seems like an advantage of not calling this *cat, since that has other rules: hcat([1,2], [3,4]) != hcat((1,2), (3,4)).

I think I have a better implementation now, which doesn't make an iterator but does accept them. It deletes all the code dealing with append! and type-widening, instead requiring that it knows the size and type up front, if neccessary by collecting. On this example:

julia> xs = [@SVector rand(3) for _ in 1:100];

julia> m = @btime reduce(hcat, $xs);
  min 539.243 ns, mean 843.973 ns (1 allocation, 2.50 KiB)

julia> m ≈ @btime reshape(collect(Iterators.flatten($xs)), 3, :)
  min 344.329 ns, mean 637.694 ns (3 allocations, 2.59 KiB)
true

julia> m ≈ @btime reshape(collect(Iterators.flatten(Tuple(x) for x in xs)), 3, :)
  min 1.492 μs, mean 2.170 μs (10 allocations, 7.73 KiB)
true

julia> m ≈ @btime stack(Tuple(x) for x in $xs)
  min 123.719 ns, mean 417.366 ns (1 allocation, 2.50 KiB)
true

Still a bit WIP, am not entirely sure whether to push to this PR or make a separate one. It also wants to allow dims, but what's still slow is stackT(Tuple(x) for x in xs; dims=1).

Dec 12 '21 18:12 mcabbott

Triage likes this. The name situation with DataFrames is tricky, but we do think stack is the canonical name so it would be good to go with that.

Dec 16 '21 21:12 JeffBezanson

FWIW, DataFrames.stack only has methods for AbstractDataFrame, so it could be done to just share the name with Base.

julia> using DataFrames

julia> methods(stack)
# 3 methods for generic function "stack":
[1] stack(df::AbstractDataFrame) in DataFrames at /Users/mason/.julia/packages/DataFrames/ORSVA/src/abstractdataframe/reshape.jl:134
[2] stack(df::AbstractDataFrame, measure_vars) in DataFrames at /Users/mason/.julia/packages/DataFrames/ORSVA/src/abstractdataframe/reshape.jl:134
[3] stack(df::AbstractDataFrame, measure_vars, id_vars; variable_name, value_name, view, variable_eltype) in DataFrames at /Users/mason/.julia/packages/DataFrames/ORSVA/src/abstractdataframe/reshape.jl:134

Dec 16 '21 21:12 MasonProtter

One-month bump?

Jan 19 '22 14:01 mcabbott

Now that we have #32310, one question here is whether this should do stack(eachcol(x)) === x. At present this PR always makes a new array, but it could be permitted to return the parent sometimes. It could also be permitted to reinterpret arrays of SVectors, and things like stack([(1,4), (2,5), (3,6)]).

May 17 '22 11:05 mcabbott

Always return a new Array seems a better choice? At least we know that subsequent modifications to the output will never affect the original inputs. If we allow to return a aliased result, then user have to learn more implement detail.

Or we add a keyword lazy = false, and only try these optimizations if user sets it as true?

May 18 '22 01:05 N5N3

Or we add a keyword lazy = false, and only try these optimizations if user sets it as true?

Would it be possible to set this based on purity (in the sense of https://github.com/JuliaLang/julia/pull/43852)? If the data is neither modified and the handle not returned by the calling function, it should be safe to set lazy=true.

May 18 '22 07:05 jonas-schulze

Bump?

These comments https://github.com/JuliaLang/julia/pull/43334#pullrequestreview-835887585 are still what I think needs feedback.

May 31 '22 18:05 mcabbott

I'll add to the sentiment that concat or concatenate might be a more natural name for this operation. But it seems that this has already been in the works for a while so that ship may have sailed. It's just that it appears to match the relationship to cat/hcat/vcat that minimum has to min, so stack just doesn't seem as easily discoverable.

Jun 10 '22 04:06 mikmoore

See here above for my argument against cat->concatenate being like max->maximum.

One un-cat-like behaviour right now is the inverse of eachrow:

julia> stack([[1, 2, 3], [4, 5, 6]]; dims=1)  # unlike any cat, this "transposes" the vectors
2×3 Matrix{Int64}:
 1  2  3
 4  5  6

julia> eachslice(ans, dims=1)  # inverse operation
2-element RowSlices{Matrix{Int64}, Tuple{Base.OneTo{Int64}}, SubArray{Int64, 1, Matrix{Int64}, Tuple{Int64, Base.Slice{Base.OneTo{Int64}}}, true}}:
 [1, 2, 3]
 [4, 5, 6]

julia> A4 = rand(5,6,7,8);

julia> A4 == stack(eachslice(A4; dims=(3,4)))  # also unlike any cat
true

Another is this:

julia> stack(((1,2), (3,4), (5,6)))
2×3 Matrix{Int64}:
 1  3  5
 2  4  6

julia> hcat((1,2), (3,4), (5,6))
1×3 Matrix{Tuple{Int64, Int64}}:
 (1, 2)  (3, 4)  (5, 6)

julia> Iterators.flatten(((1,2), (3,4), (5,6))) |> collect
6-element Vector{Int64}:
 1
 2
 3
 4
 5
 6

julia> stack(zip(1:3, 10:99))
2×3 Matrix{Int64}:
  1   2   3
 10  11  12

This allows stack(f, eachcol(m)) to act like mapslices with a non-allocating f returning a tuple, for small slices. 5x faster here.

Last commit 9d9c3e1 notes the connection that the new Iterators.flatmap(f, xy...) added in https://github.com/JuliaLang/julia/pull/44792 is like vec(stack(f, xy...)) when this exists.

Jun 11 '22 02:06 mcabbott

Indeed, this is similar to flatmap, except instead of flattening a generic collection, we are concatenating n-dimensional arrays. I definitely would find this handy, I get to use mapslices sometimes, which is specialized to arrays, and I agree it would be nice to avoid reduce(hcat,...) that I tend to write often. I never realized this was another flatmap waiting to happen!

Would the following implementation be similar? What are the main performance gotchas you have found?

julia> stack(f, c...) = reduce((a,b)->hvncat(1, a,b),map(f, c...));

julia> stack(2:4) do x randn(x,3,2) end
9×3×2 Array{Float64, 3}:
[:, :, 1] =
 -0.127353   2.16478    -0.404491
  0.937129   0.806041    0.502847
  0.607888   1.7999     -0.363583
  0.214298   1.0726      0.60682
 -1.48141    0.0117138   0.204789
  0.599684  -1.21346    -0.654787
 -0.256475   1.48369    -1.91673
  0.621435   0.836231    2.04466
 -1.76056    0.814012   -0.636969

[:, :, 2] =
  0.314927    1.38884     0.124071
 -0.0897931  -0.155252   -0.343632
  0.0969514  -0.807055   -1.53663
 -0.903224    0.466151   -1.36028
 -0.694321    0.125088    0.610418
 -0.572599   -0.222317   -0.473984
 -1.72662    -0.0908632  -0.738871
 -0.1956     -0.181142    1.8754
  0.747257    1.36481    -0.325992

Jul 09 '22 18:07 nlw0

Answering myself, hvncat(1,...) is probably what we want to avoid, it should be hvncat(3,...) in the previous example, and stack(1:3) do x randn(2,2,x) end, but is there an easy way to find the number of dimensions?

Jul 09 '22 18:07 nlw0

Another shot at it, just to illustrate foldl. Not sure Julia can write the output of f straight as an extension of the existing array, is this the kind of optimization you can achieve?...

function stack(f, c...)
    args = collect(zip(c...))
    firstarray = f(args[1]...)
    Ndim = ndims(firstarray)
    foldl((a,b)->hvncat(Ndim, a, f(b...)), args[2:end], init=firstarray)
end

julia> stack(1:3) do x reshape(1:6*x,2,3,:) end
2×3×6 Array{Int64, 3}:
[:, :, 1] =
 1  3  5
 2  4  6

[:, :, 2] =
 1  3  5
 2  4  6

[:, :, 3] =
 7   9  11
 8  10  12

[:, :, 4] =
 1  3  5
 2  4  6

[:, :, 5] =
 7   9  11
 8  10  12

[:, :, 6] =
 13  15  17
 14  16  18

Jul 09 '22 18:07 nlw0

Taking a closer look at the PR, I have a feeling it might have been stimulated by the fact Julia didn't have hvncat until 1.7, so I'm not sure what is meant by "the cat functions", if it includes the relatively new hvncat.

I'd say it sounds like stack might have had a lot going for it before hvncat was available. Now maybe it would be useful to understand what specifically it's doing --- and I'm not even thinking about performance yet.

I think an array alternative to flatten might be in order. It would probably need to take an optional dims parameter, which would work as hvncat's first parameter, but it would work like flatten by taking a collection instead of separate parameters, thus implementing reduce(hcat, myarrays), which I'm really looking forward to.

The big question is what should be the default dims. ndims(myarrays[1])+1 seems to be the proposal. I was thinking of ndims, but incrementing might be the best (that's what hcat with vectors does). the main concern for me is that looking at the memory layout, it's a simple concatenation of the data, and that happens in both these cases. (Because it's column-major... Am I right? :) )

Having said that, it would look like stack is basically a smarter hvncat, with a default dims and the ability to inspect the shape of the elements. Also I'd suggest it should try to be the array alternative to flatten, and then maybe there could be a stackmap that maps and then stacks, like the basic implementation of flatmap. Either that or stackmap is the main function and stack is just stackmap with the identity function, like it could be the case with flatmap. but this is basically an implementation choice.

One last thought: the issues with reduce may or may not be related to #45000. That's why you might like to go with foldl.

Jul 09 '22 20:07 nlw0

I now understand better the very subtle difference to cat. If I understand correctly, this would also work to replace another one of my favorite utility functions that I use to join multiple images into a big one, like ImageMagick's montage': jointiles(xx) = hcat([vcat([x for x in yy]...) for yy in eachcol(xx)]...)`

Jul 10 '22 11:07 nlw0

This answer https://github.com/JuliaLang/julia/pull/43334#issuecomment-986135890 to @jonas-schulze seems to ignore the main point. The comment proposes "a new verb" very much like stack, which would be concat or concatenate', which do not exist in Julia yet. The point is not that stackhas a conflict withcatbut that they have a similar relationship asmaxandmaximum, which I understand as being the fact that maximum(a) = reduce(max, a)', the possibility of max and other functions being able to take more arguments being a mere nifty feature on top of their fundamental tasks as binary operations.

This is the important relationship that must be well understood, the fact this reducing function should be named stack, concat or wakawakatapeng is a different issue. It's not that cat makes stack moot. But it's important to understand how any performance improvements used there could benefit eg. reduce in general.

The big difference I see now between stack and flatten and reduce(cat,...) is that it can take a multi-dimensional array of multidimensional arrays, and concatenate them all. I think it's a great feature, and in functions such as flatten it doesn't happen because arrays are merely traversed linearly. On the other hand, in the end we are still outputting an array that can be easily reshaped. And flatten does take a collection as input, not separate elements like cat. So they are very similar, the difference being that flatten is not aware of arrays. Could it be, though? Also, it might benefit, as well as other functions, from the optimizations done in stack. And stack might benefit from optimizations in other functions such as done in reduce(cat... I find it difficult to understand how these relationships are not clear, and why it's important.

Jul 10 '22 15:07 nlw0

I appreciate the interest here, but this is a long thread already. Can I suggest that an unstructured discussion of what's generally going on might be better on Zulip/Slack or something?

The proposed change is pretty clearly described, I think, and has triage approval. As far as I know little performance is left on the table. Simple ideas are often 20x slower.

default dims. ndims(myarrays[1])+1 seems to be the proposal

This is not quite correct. The 2nd & 3rd paragraphs of the docstring try to explain the difference between not providing dims (the default is dims=:), and providing an integer. It is the first which is closest to Iterators.flatten. The "Higher-dimensional examples" should show all variants.

but that they have a similar relationship as max and maximum

There are similarities, of course. But once you look closer they are sufficiently different that this analogy is more confusing than helpful, IMO. No cat function un-does eachrow. New axes vs. same axes. Etc, as detailed far above, and illustrated above.

Jul 10 '22 17:07 mcabbott

My comments are unrelated to implementation or performance. I'm sure the implementation is excellent. Perhaps the function might be implemented as a special method of flatten, and the relationship with cat must be recognized. Iterators.flatten simply ignores array structures, but it might be useful to have it produce elements from the block-matrix, aware of the array structures, instead of simply iterating over the arrays. Or maybe a separate function is justified, but it's still useful to understand how this array-of-arrays traversal feature. And this appears to be the really distinct feature proposed, it should be emphasized more.

cat takes separate elements, not a list or array, this is clear. It could never be the inverse of eachcol. That's because there's no cat maximum analogous to max. It's not that simple, though, because it's not a simple reduction, it takes into account array structure. I think it's a great functionality. I'm just trying to understand how it relates to other functions. Reduce can take a dims argument, for instance, I'm not sure exactly how it works, but might there not be a way to attain the same functionality this way? Maybe reduce and cat could benefit from some refactoring. Allow me to clarify again, I am not considering performance, and I think this PR is great and should be merged.

Docstring says about cat: "rather than placing the arrays along new dimensions." what does that mean?

Jul 11 '22 06:07 nlw0

I know nobody's interested, but for those who are interested this is just what I was curious to find out. Could probably use a fold instead of recursion. Peace out.

@show dd = map([(j,k,l) for j in 1:3, k in 1:3, l in 1:3]) do (jkl)
    reshape(1:prod((jkl)), jkl...)
end

catdim(dims) = (a,b) -> cat(a,b,dims=dims)

reduce(catdim(3), (reduce(catdim(2), (reduce(catdim(1), view(dd,:,b,c)) for b in 1:3)) for c in 1:3))

function awfulstack(dd)
    mydim = ndims(dd)
    reduce(catdim(mydim), (awfulstack_(dd, mydim-1, n) for n in 1:size(dd, mydim)))
end
function awfulstack_(dd, mydim, indices...)
    if mydim==1
        reduce(catdim(mydim), view(dd,:,indices...))
    else
        reduce(catdim(mydim), (awfulstack_(dd, mydim-1, n, indices...) for n in 1:size(dd, mydim)))
    end
end

@show aa = awfulstack(dd)

Jul 11 '22 07:07 nlw0

Would be great to see the so-called stack functionality finally in Base!

It's positioned as the mirror operation to eachslice, and indeed it is. However, it's somewhat unfortunate to have these asymmetries:

Naming: eachslice and stack have nothing in common. No way to guess the other function if you already know one of them. Compare with splitdims + combinedims in SplitApplyCombine, where a typical processing pipeline looks like A |> splitdims |> filter(...) |> map(...) |> combinedims It's clear that the two steps "invert" each other.
eachslice returns a view (right?), while stack will materialize the result (right?). This is important for both performance and for modifying the result - the changes either propagate to the original array or not. Again, the existing implementation in SplitApplyCombine makes the difference clear: splitdims and combinedims are copies, splitdimsview and combinedimsview are views. Both ...view functions can be used to modify the original array.

Do you think the situation with either of these asymmetries can be improved?

Jul 13 '22 06:07 aplavin

Re names, "stack" vs "slice" are plausibly a pair, right? The weird part is "each", which (IIRC) was because these were originally iterators.

One reason for the view asymmetry is that, while a slice of a larger Array is almost as good as a solid array (it has one pointer, and will often be acceptable to BLAS etc), a stacked view of many smaller arrays is an awkward thing (indexing needs to follow different pointers, and no CuArray code is going to work) thus not an obvious performance improvement. I think this is why triage last year thought that the eager version ought to be in base, https://github.com/JuliaLang/julia/issues/21672#issuecomment-880943222 , which this PR followed (allowing also (size(A[1])..., size(A)...) from https://github.com/JuliaLang/julia/issues/21672#issuecomment-299486794 since it seems natural.)

Jul 13 '22 12:07 mcabbott

Re names, "stack" vs "slice" are plausibly a pair, right?

Maybe... I'm not a native speaker, and they don't really associate for me. Possibly, this is just getting used to: the "split-apply-combine" term is so widely used that the split <-> combine link immediately comes into mind.

As for views vs materialized, there are different usecases. Sometimes, you don't need to access the whole resulting array multiple times, then a view can be faster even in the stack direction - instead of allocating yet another copy. Even if not faster, it can be useful to propagate setindex to original arrays. Even something like this actually works, through both split and combine:

A = ...
B = A |> splitdimsview |> filter(...) |> combinedimsview
B[123, 456] = 789
# A modified correspondingly

It feels quite strange to only have a view function in one direction, and only a materializing function in the other.

Jul 13 '22 15:07 aplavin

There are cases where the view is faster, and many packages have written such a thing. In https://github.com/JuliaLang/julia/issues/21672 there was not much enthusiasm for adding a new stacked array type to Base, and this PR accordingly makes the eager version.

Being eager also lets it accept an iterator of iterators at no cost, rather than just an array of arrays, so that stack(f, xs) need not materialise the intermediate map(f, xs). It seems that many people have wished that something like mapreduce(f, hcat, iter) could be efficient. Or that comprehensions would produce one array, like stack(f(x) for x in iter) does here.

I have never found mutating the slices through some stacked type to be useful in practice. Unlike foreach(f!, eachcol(mat)) which lets you write something like an in-place mapslices!.

Jul 13 '22 15:07 mcabbott

Thanks for detailed explanations!

Yet another question: can stack be made type-stable? Currently, with custom array types, it correctly returns the same array type. But infers to the union of that array type and a plain array:

julia> As = [StructArray(a=[1,2]), StructArray(a=[3,4])]

julia> @code_warntype stack(As)
MethodInstance for Main.var"workspace#5".stack(::Vector{StructArrays.StructVector{NamedTuple{(:a,), Tuple{Int64}}, NamedTuple{(:a,), Tuple{Vector{Int64}}}, Int64}})
  from stack(iter; dims) in Main.var"workspace#5" at /home/aplavin/.julia/pluto_notebooks/Remarkable science.jl#==#ddf3ff36-94df-408a-b25e-00fc9ee0e882:1
Arguments
  #self#::Core.Const(Main.var"workspace#5".stack)
  iter::Vector{StructArrays.StructVector{NamedTuple{(:a,), Tuple{Int64}}, NamedTuple{(:a,), Tuple{Vector{Int64}}}, Int64}}
Body::Union{StructArrays.StructArray{NamedTuple{(:a,), Tuple{Int64}}, 2, NamedTuple{(:a,), Tuple{Matrix{Int64}}}, Int64}, Matrix{NamedTuple{(:a,), Tuple{Int64}}}}
1 ─ %1 = Main.var"workspace#5".:(var"#stack#13")(Main.var"workspace#5".:(:), #self#, iter)::Union{StructArrays.StructArray{NamedTuple{(:a,), Tuple{Int64}}, 2, NamedTuple{(:a,), Tuple{Matrix{Int64}}}, Int64}, Matrix{NamedTuple{(:a,), Tuple{Int64}}}}
└──      return %1

At least, if I didn't make a mistake copying code from this PR.

Jul 13 '22 21:07 aplavin

I meant to come back to check what array types are returned, I have forgotten what the logic was here.

I hadn't tried StructArrays but agree it's nice if they are preserved. StaticArrays are not, which is good. GPU arrays are, which is also good. (OffsetArrays work, by design, and have tests.)

julia> using StructArrays, StaticArrays, Metal, OffsetArrays


julia> As = [StructArray(a=[1,2]), StructArray(a=[3,4]), StructArray(a=[5,6])]
3-element Vector{StructVector{NamedTuple{(:a,), Tuple{Int64}}, NamedTuple{(:a,), Tuple{Vector{Int64}}}, Int64}}:
 [(a = 1,), (a = 2,)]
 [(a = 3,), (a = 4,)]
 [(a = 5,), (a = 6,)]

julia> stack(As)
2×3 StructArray(::Matrix{Int64}) with eltype NamedTuple{(:a,), Tuple{Int64}}:
 (a = 1,)  (a = 3,)  (a = 5,)
 (a = 2,)  (a = 4,)  (a = 6,)

julia> stack(merge.(a, ((b=10,),)) for a in As) isa StructArray
true

julia> @code_warntype stack(As)  # doesn't infer
...
Body::Union{StructArray{NamedTuple{(:a,), Tuple{Int64}}, 2, NamedTuple{(:a,), Tuple{Matrix{Int64}}}, Int64}, Matrix{NamedTuple{(:a,), Tuple{Int64}}}}


julia> Bs = [SA[1,2], SA[3,4], SA[5,6]];

julia> stack(Bs)  # don't want one big MArray
2×3 Matrix{Int64}:
 1  3  5
 2  4  6

julia> @code_warntype stack(Bs)  # this is fine
...
Body::Matrix{Int64}


julia> Cs = [MtlArray([1,2]), MtlArray([3,4]), MtlArray([5,6])];  # these are GPU arrays

julia> Metal.allowscalar(false)

julia> stack(Cs)
2×3 MtlArray{Int64, 2}:
 1  3  5
 2  4  6

julia> reduce(hcat, Cs) isa MtlArray
true

julia> stack(c .+ 1 for c in Cs) isa MtlArray
true

julia> @code_warntype stack(Cs)  # doesn't infer
...
Body::Union{MtlArray{Int64, 2}, Matrix{Int64}}


julia> Os = [OffsetArray([i,2i],3) for i in 1:3];

julia> Os[1] |> axes
(OffsetArrays.IdOffsetRange(values=4:5, indices=4:5),)

julia> stack(Os)
2×3 OffsetArray(::Matrix{Int64}, 4:5, 1:3) with eltype Int64 with indices 4:5×1:3:
 1  2  3
 2  4  6

julia> @code_warntype stack(Os)
...
Body::Union{Matrix{Int64}, OffsetMatrix{Int64, Matrix{Int64}}}

It would be good to figure out how to make these infer. (And how to test them.) My guess is that the handling of the empty case is what's throwing things off. @code_warntype reduce(hcat, As) is fine, but reduce(hcat, empty!(As)) throws an error. Maybe that's better, not sure what the policy should be. In the empty case, you don't have an example for similar, and can't use the type:

julia> similar([1,2,3], (4,5)) |> summary
"4×5 Matrix{Int64}"

julia> similar(typeof([1,2,3]), (4,5)) |> summary
ERROR: MethodError: no method matching Vector{Int64}(::UndefInitializer, ::Tuple{Int64, Int64})

Edit: I also don't know whether these cases are particularly fast. If packages want to overload this for a type they own, then it might be worth making sure there's an easy hook for that --- overloading stack(::Vector{<:CuArray}) won't work for iterators.

Jul 13 '22 21:07 mcabbott

this looks good to me, with pretty thorough and clear tests; is there anything holding it up?

Aug 09 '22 11:08 ericphanson

I hope it's nearly ready. Remaining questions are, I think:

What should happen in empty cases? Examples here and concerns about this introducing type-instabilities here. Maybe it should produce errors more often instead.
If packages like CUDA.jl or StaticArrays.jl or NamesDims.jl need to overload something to handle their type, does the PR provide appropriate hooks for doing so? They can't just add methods to the exported stack because of generators etc. like stack(f(x,y) for xy in xs, y in ys).
"Can a HasLength iterator in fact collect to have multiple dimensions?" I know of no examples, but find the docs unclear on this.

Aug 09 '22 13:08 mcabbott

If packages like CUDA.jl or StaticArrays.jl or NamesDims.jl need to overload something to handle their type, does the PR provide appropriate hooks for doing so? They can't just add methods to the exported stack because of generators etc. like stack(f(x,y) for xy in xs, y in ys).

GPUArrays should be OK in the common case:

julia> using JLArrays

julia> vecs = (jl(ones(3)), jl(zeros(3)))
([1.0, 1.0, 1.0], [0.0, 0.0, 0.0])

julia> stack(vecs)
3×2 JLArray{Float64, 2}:
 1.0  0.0
 1.0  0.0
 1.0  0.0

But it looks like in the empty case maybe there's an issue?

julia> vecs = JLArray{Float64, 1}[]
JLArray{Float64, 1}[]

julia> stack(vecs)
1×0 Matrix{Float64}

I would say though, as long as hooks can be added in a non-breaking way, it shouldn't hold off this PR, but the semantics of empty arrays sound like they do matter for this PR.

I wonder if there's a way to unify the empty codepaths with non-empty while adding hooks by creating anstack_output_array function that replaces the uses of similar. Then maybe _empty_stack could call this with zero sizes, while non-empty stack uses this to build the undef'd output that we copyto!, and packages can overload this to customize the output type for given input types. Probably easier said than done...

Aug 09 '22 15:08 ericphanson

One overload question is what happens to immutable arrays, e.g. stack(SA[i,2i] for i in SA[1,2,3]) currently makes an MMatrix, could prefer an SMatrix. And what should a stack of ImmutableArrays be?

For GPU arrays, I wonder a little if there should be a more parallelised path. As you say, adding a hook for that later is non-breaking.

The current good behaviour comes from these methods of similar:

julia> using StaticArrays, JLArrays, StructArrays

julia> similar(SA[1,2,3], Float32, 2,3) |> summary  # not an MArray as the given size isn't static
"2×3 Matrix{Float32}"

julia> similar(SA[1,2,3], Float32, axes(SA[1 2 3; 4 5 6])) |> summary
"2×3 MMatrix{2, 3, Float32, 6} with indices SOneTo(2)×SOneTo(3)"

julia> similar(jl([1,2,3]), Float32, 2,3) |> summary
"2×3 JLArray{Float32, 2}"

julia> similar(StructArray((x=[1,2,3],)), Float32, 2,3) |> summary
"2×3 Matrix{Float32}"

julia> similar(StructArray((x=[1,2,3],)), typeof((x=0f0,)), 2,3) |> summary
"2×3 StructArray(::Matrix{Float32}) with eltype NamedTuple{(:x,), Tuple{Float32}}"

julia> similar(Diagonal([1,2]), Float32, 2, 2, 5) |> summary
"2×2×5 Array{Float32, 3}"

Rather than have a function particular to stack to do better at the empty case, the seemingly obvious thing would be add methods to similar accepting a type not an instance. For all of these, the type has all the information, so things like similar(typeof([1,2,3]), Float32, (4,)) could work.

One would be to allow (for now) only empty cases which produce an Array, all others error. That would solve the type-stability issues. But perhaps that's too weird. If we make them all errors now, we can in principle upgrade similar & restore empty cases, later?

Aug 09 '22 20:08 mcabbott

If we make them all errors now, we can in principle upgrade similar & restore empty cases, later?

This path sounds good to me. reduce didn’t support init until 1.6 but was still very useful before then. So in my opinion erroring on empty arrays is better than not having the function, and with an error it’s easy to improve it non-breakingly. Although allowing empty arrays in special cases we can figure it out might be ok too; again following reduce, it handles some special cases without init.

edit: I think the current reduce(vcat, …) idiom was worth a try but ultimately is pedagogically wrong (teaches folks the wrong things) so we should try to have an alternative asap, and this stack is the right thing.

Aug 09 '22 22:08 ericphanson

Ok 72a9de9 makes all empty cases give errors. This restores type-stability in the StructArray & JLArray examples above.

Aug 10 '22 15:08 mcabbott

Awesome! Going back to the 3 outstanding questions, if we postpone hooks as a nonbreaking future feature, then we're left with

"Can a HasLength iterator in fact collect to have multiple dimensions?" I know of no examples, but find the docs unclear on this.

So the interface docs say

Important optional methods	Default definition	Brief description
IteratorSize(IterType)	HasLength()	One of HasLength(), HasShape{N}(), IsInfinite(), or SizeUnknown() as appropriate

I think one can make a case that "as appropriate" indicates that you can't use HasLength() if it really has multiple dimensions, although I agree it's not totally clear.

Turning to the code, it looks like the collect methods in array.jl (both typed and untyped) use this internal _similar_shape function for deciding the output size:

https://github.com/JuliaLang/julia/blob/bcda258e070dcae432b995f4382e1f2a9c4b505a/base/array.jl#L657-L659

This seems to indicate that HasLength is one-dimensional. I'm not actually sure how to get collect to produce a multidimensional result from a HasLength iterator if you only define iterate and the traits and size. I think you'd need to provide a method directly for collect itself which feels like a violation of the protocol. But I don't have much experience with this.

So to me it seems OK as is, and then the PR just needs NEWS (and someone with merge rights to do so 🙂).

Aug 10 '22 22:08 ericphanson

4169526 adds a NEWS entry. Maybe this is ready?

Aug 15 '22 00:08 mcabbott

Should I read anything into the test failures? They appear the same on re-running, and don't seem to appear on other recent PRs. But they also look unrelated to me, e.g.:

:mac: test x86_64-apple-darwin

stdlib/v1.9/InteractiveUtils/test/runtests.jl:252
--
  | Test threw exception
  | Expression: !(occursin("Environment:", read(setenv(`$exename -e 'using InteractiveUtils; versioninfo()'`, String[]), String)))

:linux: test powerpc64le-linux-gnu

stdlib/v1.9/LinearAlgebra/test/qr.jl:426
--
  | Got exception outside of a @test
  | BoundsError: attempt to access 10×4 view(::Matrix{Float16}, 1:10, 2:5) with eltype Float16 at index [2:10, 2]
  | Stacktrace:
  | [1] throw_boundserror(A::SubArray{Float16, 2, Matrix{Float16}, Tuple{UnitRange{Int64}, UnitRange{Int64}}, false}, I::Tuple{UnitRange{Int64}, Int64})

Aug 16 '22 22:08 mcabbott

They should be unrelated, as we are pretty sure that stack is not used in Base and InteractiveUtils. I think this PR is ready to merge. Not sure will there be other folks want to take a look.

Triage likes this. The name situation with DataFrames is tricky, but we do think stack is the canonical name so it would be good to go with that.

Let's merge this at this weekend if there's no other objections.

Aug 17 '22 07:08 N5N3

I have been waiting for a function like this for years. Thank you @mcabbott and the Julia community at large, you all are amazing

Aug 22 '22 00:08 raphaelchinchilla

julia
julia copied to clipboard

Add `stack(array_of_arrays)`

julia julia copied to clipboard

Add `stack(array_of_arrays)`

julia
julia copied to clipboard