ArraysOfArrays.jl
ArraysOfArrays.jl copied to clipboard
getindex(.) return type does not match eltype
First off, awesome idea/package! It's been quite useful in my own work.
I was really excited to use your package in conjunction with StructArrays but ran into an issue (that I attempted to fix with a PR to StructArrays) as explained here. The owner of that package pointed out, however, that the real issue is the following:
julia> x=nestedview(rand(2,3,10), 2);
julia> x[1] isa eltype(x)
false
What are your thoughts on modifying eltype to be consistent with the true return type of getindex?
-Colin
Thanks!
Yes, the getindex/eltype inconsistency is annoying, esp. when using it with StructArray or TypedTable.
I definitely want to solve that, but it's a bit tricky - just using the underlying array type as eltype is not enough, because we get into trouble (e.g. with broadcast) if it's a view.
There's also this problem: If we just make eltype returns the right thing, then there would be a mismatch between eltype and the element type deduced from T in foo(A::AbstractArray{T) on an ArraysOfArrays array. I think a lot of code is written based on the assumtion that T and eltype(A) match.
Just to pitch in (as this was brought up in https://github.com/piever/StructArrays.jl/pull/84), I think that changing the eltype should be done not by adding a custom eltype method but by giving the correct type parameter to AbstractArray, i.e. ArrayOfArray<:AbstractArray{SubArray...}. Implementation-wise, this should be an extra type parameter of ArrayOfArray that gets computed in an internal constructor.
I fully agree, @piever. That's what makes it a bit more tricky - but I have an idea (like you said, via constructor), I'll try it out.
@piever, @colinxs, haven't forgotten about this - holiday season, but it's definitely on my short-term ToDo list.
Update: Haven't forgotten about this, just had to be put on hold for some other urgent work. Will get back on this soon.
Sorry this is taking so long - it's not forgotten and definitely still on my to-do list!
@oschulz no worries, I actually figured this out and have been using it in my own codebases:
# For AbstractArray A and indices I, compute V = typeof(view(A, I...)) by:
# 1) Try to infer V from typeof(A) and typeof(I) by calling viewtype(typeof(A), typeof(I))
# 2) If !isconcretetype(V), fall back to typeof(view(A, I...))
# Custom subtypes of AbstractArray (e.g. UnsafeArray) could provide customized implementations
# of viewtype(A::AbstractArray, I::Tuple) or viewtype(::Type{<:AbstractArray}, ::Type{<:Tuple})
# if required.
function viewtype(A::AbstractArray, I::Tuple)
T = viewtype(typeof(A), typeof(I))
_viewtype(A, I, T, Val(isconcretetype(T)))
end
viewtype(A::AbstractArray, I...) = viewtype(A, I)
function viewtype(::Type{A}, ::Type{I}) where {A<:AbstractArray,I<:Tuple}
Core.Compiler.return_type(view, _view_signature(A, I))
end
Base.@pure _view_signature(::Type{A}, ::Type{I}) where {A,I<:Tuple} = Tuple{A, I.parameters...}
_viewtype(A, I, T, ::Val{true}) = T
function _viewtype(A, I, T, ::Val{false})
try
return typeof(view(A, I...))
catch e
Istring = join(map(string, I), ", ")
printstyled(stderr,
"""
Unable to infer typeof(view($(typeof(A)), $(join(map(i -> string(typeof(i)), I), ", "))))
Only other option is to try typeof(view(A, $Istring)) but that resulted in below error.
Try passing in valid indices or implement:
viewtype(::Type{$(typeof(A))}, ::Type{$(typeof(I))})
""", color=:light_red)
rethrow(e)
end
end
I've got an in-works implementation something similar to ArrayOfSimilarArrays that I'll make a PR for.
Thanks again for keeping on top of this! :)
Also, two other options for _viewtype(A, I, T, ::Val{false}) that I considered were:
_viewtype(A, I, T, ::Val{false}) = @inbounds view(A, I...)
@propagate_inbounds _viewtype(A, I, T, ::Val{false}) = view(A, I...)
The first has the benefit of working for empty arrays, but that seems unsafe. The second option
I rejected because that would require e.g. @inbounds nestedview(A_flat, 2), which seemed weird to do when calling a constructor.
Thanks, I really got to add this soon!
As an alternate idea, you could use the SlicedArray that I made here.
I ended up needing the ability to slice along arbitrary dimensions instead just the inner M dimensions and ran across JuliennedArrays.jl. I took the best parts of that and ArraysOfArrays to make SlicedArray which I use internally for Lyceum. An ArrayOfArrays{T,M} could be implemented as:
function sliceinner(A::AbstractArray{<:Any,L}, ::Val{M}) where {L,M}
slice(A, ntuple(_ -> Colon(), Val(M))..., ntuple(_ -> *, Val(L-M))...)
end
You could also use JuliennedArrays directly, SlicedArray is the same idea just with more overloads for Base functions, comprehensive tests, and better performance. I'd like to upstream those changes in the future but haven't had time yet. If this sounds useful I could register SpecialArrays (currently it's only registered in the Lyceum registry).
As an aside, you might find some of the test utilities I wrote useful, given that you have a lot of packages providing various custom arrays. That test suite is what originally prompted the changes to ElasticArrays I made in https://github.com/JuliaArrays/ElasticArrays.jl/pull/20. Perhaps factoring that out to something like ArrayTestTools.jl would be useful to JuliaArrays/others?
I could register SpecialArrays
Hm, since ArraysOfArrays is already registered, and since there would be conflicts with some exported functions (like innersize), I guess I would prefer to backport your improvements to ArrayOfSimilarArrays. If the package has to evolve in the process (incl. some API changes), that would be fine. I've actually been planning to move it under the "JuliaArrays" organization to make it a bit more "official" - talked to Tim Holy about it a JuliaCon, just haven't gotten 'round to it yet.
Perhaps factoring that out to something like ArrayTestTools.jl would be useful to JuliaArrays/others?
Yes, that might indeed be very useful!
Makes sense to backport the changes ArraysOfArrays.jl! Given that SlicedArray (and also FlattenedArray, which is a flattened view of a nested array) were inspired largely by JuliennedArrays.jl it'd be good to check with its dev (@bramtayl) first? The way forward could then be to upstream SlicedArray/FlattenedArray to JuliennedArrays.jl and then have ArraysOfArrays.jl depend on that? The alternative would be to just put everything in ArraysOfArrays.jl directly.
In the meantime, I'll work on factoring out ArrayTestTools.jl.
It makes me so happy when I find out people are reading my packages! Happy to accept PRs to JuliennedArrays
@bramtayl, I just came across JuliennedArrays a while ago - from what I saw, it's quite similar to
ArrayOfSimilarArrays, but trades increased flexibility (slicing in any dimension) for some performance?
Would you possibly be interested in moving it into ArraysOfArrays, or would you prefer to keep it as a lightweight separate package?
@colinxs having a FlattenedArray in ArraysOfArrays would be great - I've been meaning to add something like that, to replace the current generic implementation of flatview
@inline flatview(A::AbstractArray{<:AbstractArray}) = Base.Iterators.flatten(A)
with something that actually returns an AbstractArray-subtype. One of the things that kept sliding to the bottom of my to-do list ... :-)
@oschultz I'd be thrilled to move it into ArraysOfArrays so I don't have to maintain the code anymore. I'd be curious to know more about performance differences
hahah and I love when devs are super responsive and eager to accept PRs :).
@bramtayl, when you have a moment want to look over SlicedArray and FlattenedArray. The major changes are:
Slices
- renamed to
SlicedArrayto matchBase's naming convention (SubArray,PermutedDimsArray) - Exposes
slice(A, dims::Tuple),slice(A, dims::Integer...), andslice(A, Union{Colon,typeof(*)})as the user-facing API (similar toSubArrayandview).slice(A, :, *, :)would be equivalent toSlices(A, True(), False(), True()), it's just syntactical sugar. - Adds several methods for
Basefunctions likeresize!,append!etc. - Adds a drop-in replacement for
Base.mapslices - Adds adjoints for Zygote.jl (conditionally loaded using Requires.jl)
Align
- Renamed to
FlattenedArray - Exposes
flatview(A)as the user-facing API - Removed the ability to specify
alongssince you can get the same functionality by composing withPermutedDimsArray - Adds adjoints for Zygote.jl
I also significantly re-wrote the internals for better inference/performance and added full test coverage.
Concerning the performance difference: there isn't any. All of the fancy indexing that allows slicing along arbitrary dimensions effectively compiles out.
@colinxs those all sound like improvements to me. Earlier versions of JuliennedArrays used * and : to specify dimensions (I think that was a Tim Holy suggestion). I changed to True and False because I got excited about the wider potential of Typed Bools (hasn't panned out yet). I'm not sure I understand about PermutedDimsArray.
I'd be thrilled to move it into ArraysOfArrays so I don't have to maintain the code anymore.
Haha, and here I thought we'd maintain it together. ;-)
I'd be curious to know more about performance differences
Good point, we should definitely do some benchmarking!
Regarding PermutedDimsArray - when I wrote ArrayOfSimilarArrays I actually thought that applications that needed to slice along another axis could us an ArrayOfSimilarArrays wrapped around a PermutedDimsArray. Would be interesting to see if that could achieve similar performance to JuliennedArrays.Slices. If so, maybe we could reduce code there quite a bit and make things easier to maintain if we join it all in ArraysOfArrays.
Adds adjoints for Zygote.jl
Oh, that would be nice to have here (ArraysOfArrays) too!
@bramtayl PermutedDimsArray just remaps dimensions e.g.:
julia> A=rand(2,3)
2×3 Array{Float64,2}:
0.307533 0.822543 0.530326
0.591914 0.782564 0.948809
julia> B=PermutedDimsArray(A, (2,1))
3×2 PermutedDimsArray(::Array{Float64,2}, (2, 1)) with eltype Float64:
0.307533 0.591914
0.822543 0.782564
0.530326 0.948809
julia> A[1,2] == B[2,1]
true
So if you had A = [rand(2,3) for i=1:4], Align(A, True(), True(), False()) and PermutedDimsArray(Align(A, False(), True(), True()), (2, 3, 1)) would be equal.
Oh I see. Yes that makes sense to me. I'm not sure if there would be a performance penalty for PermutedDimsArray?
@oschulz
I thought about doing the same thing, but you miss out on some optimizations with the extra layer of indirection. If we had:
A = rand(2,3,4)
B = slice(A, :, *, *)
Then doing B[:, 1] as implemented is basically a no-op because we can just forward the Colon to the parent array and "re-slice". See this comment and the subsequent getindex definition for an example.
If you rely on PermutedDimsArray, then you'd be falling back to Cartesian indexing and collecting the output in a new array.
@bramtayl
SlicedArray still uses True/False internally, but I figured using :/* was a friendlier user-facing API.
I like : / *!
But in your example above - slice(A, :, *, *) could already be done with just an ArrayOfSimilarArrays, without PermutedDimsArray, right? Or do if mix : and * up, here`?
In any case, if a more generic version of ArrayOfSimilarArrays is possible without loss of performance, we could add it to ArraysOfArrays and reduce ArrayOfSimilarArrays to a type alias (possibly with an optimized implementation for some functions, if necessary).
@oschulz exactly, but ArrayOfSimilarArrays is just a special-case of a SlicedArray or @bramtayl's Slices.
Basically, if we incorporate SlicedArray/Slices into ArraysOfArrays.jl then it would entirely replace ArrayOfSimilarArrays.
@oschulz @bramtayl given the above improvements I made would you all be comfortable incorporating FlattenedArray/SlicedArray instead of Align/Slices?