ArrayPartition violates the documentation of eltype
Describe the bug 🐞
In some situations ArrayPartition violates the documentation of eltype:
eltype(type)
Determine the type of the elements generated by iterating a collection...
But this is not always the case (see MWE below).
As I understand the ArrayPartition declares eltype as a type that all udnerlying elements can be promoted to.
This behaviour might be by design, but it has a real practical implication.
Perhaps can be fixed on ForwardDiff side by extra promoting, but this can have other weird edge cases.
At the very least this behaviour can be documented as desired?
Minimal Reproducible Example 👇
julia> v = ArrayPartition([ 0.0 ], [ 1 ])
([0.0], [1])
julia> for e in v
@show e isa eltype(v)
end
e isa eltype(v) = true
e isa eltype(v) = false # would expect `true` here
Environment (please complete the following information):
- Output of
using Pkg; Pkg.status()
(jl_ZYBD4c) pkg> st
Status `/private/var/folders/kf/1t2_wf7n49sgxmzzmmgsb1j40000gn/T/jl_ZYBD4c/Project.toml`
[731186ca] RecursiveArrayTools v3.26.0
- Output of
versioninfo()
julia> versioninfo()
Julia Version 1.10.4
Commit 48d4fd48430 (2024-06-04 10:41 UTC)
Build Info:
Official https://julialang.org/ release
Platform Info:
OS: macOS (arm64-apple-darwin22.4.0)
CPU: 10 × Apple M2 Pro
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-15.0.7 (ORCJIT, apple-m1)
Threads: 1 default, 0 interactive, 1 GC (on 6 virtual cores)
As I understand the ArrayPartition declares eltype as a type that all udnerlying elements can be promoted to.
I don't think this is quite true, although I'm not sure what the real rule is:
julia> ap = ArrayPartition([[1,2]], [3,4], [(5,6)])
([[1, 2]], [3, 4], [(5, 6)])
julia> eltype(ap)
Int64
julia> [typeof(x) for x in ap]
4-element Vector{DataType}:
Vector{Int64} (alias for Array{Int64, 1})
Int64
Int64
Tuple{Int64, Int64}
The simplest fix would be to declare a union eltype, possibly via Base.promote_typejoin. Alternatively, make getindex call convert(eltype(ap), val), although this will fail on read for the example above -- it would be better to fail on construction, if such things aren't supported.
The reasoning behind this behavior is because there is a common use case with Unitful where a system of second order ODEs is split in two, but some of the values are positions while the others are velocities. This in fact is what started ArrayPartition and still tends to be it's most widely used case in the wild, just without Unitful. Because of this, ArrayPartition operations tend to be specialized to not require the global index. For example, broadcast will not use it, which is why there's not a performance hit for it's uses in SciML. The downside of course is that it cannot guarantee performance on the abstractvector form.
That said, the linear index is never particularly fast because it has branching to consider and this removes SIMD optimizations. So you pretty much only want to use it as a fallback anyways. That makes ArrayPartition an odd AbstractVector: it has all of the interface defined, but most functions like iterators have a faster dispatch by avoiding treating it like a Vector. Thus the Abstractvector piece is more for convenience than performance. The Array interface overloads then confirm this by setting traits to say this does not have fast linear indexing, telling downstream codes to use alternative code paths.
With all of that in mind, eltype is then somewhat not well-defined in this case, though what would make it interface compliant would be to ensure it is matching the iteration definition correctly. That may need a bit of a bug fix and a bit more testing which shouldn't be too hard. But, I would still stress that even if you get a concrete eltype from this, you still likely want to specialize functions as possible if you need performance, and most common array functions (any, all, etc.) already have these specializations defined
Everything makes sense, but I don’t think performance is a significant concern here. Linear indexing may not be the fastest, but that’s acceptable. If eltype is not frequently used and specialized dispatch is more common, it might be worthwhile to adjust the definition to be more compatible with AbstractVector.
Also the example provided from @mcabbott is really bad imo which is a gun waiting to blew up someones leg:
julia> ap = ArrayPartition([[1,2]], [3,4], [(5,6)])
([[1, 2]], [3, 4], [(5, 6)])
julia> eltype(ap)
Int64
julia> collect(ap)
ERROR: MethodError: Cannot `convert` an object of type Vector{Int64} to an object of type Int64
Closest candidates are:
convert(::Type{T}, ::T) where T<:Number
@ Base number.jl:6
convert(::Type{T}, ::T) where T
@ Base Base.jl:84
convert(::Type{T}, ::Number) where T<:Number
@ Base number.jl:7
...
Stacktrace:
[1] setindex!(A::Vector{Int64}, x::Vector{Int64}, i1::Int64)
@ Base ./array.jl:1021
[2] setindex!
Interestingly enough, this works fine (why the previous one fails then?)
julia> ap = ArrayPartition([1], ["string"])
([1], ["string"])
julia> collect(ap)
2-element Vector{Any}:
1
"string"
julia> eltype(ap)
Any
Okay I see where the confusion is coming from. ArrayPartition has functionality for being nested (like all array types in this package). In that case, the eltype is the eltype found on the bottom, and it walks using the bottom of the tree.
julia> ap = ArrayPartition((ArrayPartition([1,2]), ArrayPartition([3,4]),))
(ArrayPartition{Int64, Tuple{Vector{Int64}}}(([1, 2],)), ArrayPartition{Int64, Tuple{Vector{Int64}}}(([3, 4],)))
julia> eltype(ap)
Int64
julia> ap = ArrayPartition((ArrayPartition([1,2]), ArrayPartition([3.0,4]),))
(ArrayPartition{Int64, Tuple{Vector{Int64}}}(([1, 2],)), ArrayPartition{Float64, Tuple{Vector{Float64}}}(([3.0, 4.0],)))
julia> eltype(ap)
Float64
That is using recursive_bottom_eltype. The vector indexing is then defined on the bottom walk, so this would be length 4.
I think the eltype recursion is mixing up with tuples in there. It's this line of code:
https://github.com/SciML/RecursiveArrayTools.jl/blob/master/src/array_partition.jl#L35
I think the issue here is we need to more cleanly define what the bottom is: i.e. is ap[4] == 5 or ap[4] == (5,6)?
julia> ap = ArrayPartition([[1,2]], [3,4], [(5,6)])
([[1, 2]], [3, 4], [(5, 6)])
julia> ap[4]
(5, 6)
Currently it recurses the eltype definition there but does not recurse the vector definition, so that's the disconnect. I.e. should ap = ArrayPartition([[1,2]], [3,4], [ArrayPartition(5,6)]) give a different vector form? Currently this case is left undefined in ArrayPartition's definition so it does something but it doesn't do something sensible.
There's a few solutions I see for this:
- Change the indexing to implicitly recurse collections. This is naturally hard though because "what is a collection" can be hard to asess.
- Keep the indexing the same and fix the element type.
- Error if a collection is passed in as an element, telling people that if they wish to recurse collections they should make them ArrayPartitions.
It sounds like 2 would be the least breaking here, and what people would expect?
I believe that support for better recursive arrays/nesting is a separate issue and deserves its own discussion. This particular issue is about the discrepancy between the eltype declaration and the actual type generated by iterating over the collection. Specifically, the eltype declares a different type than what is produced during iteration. As demonstrated in the original example where eltype declares Float64, but iteration returns Int. Nested collections were just an extreme example of such discrepancy, but this issue occurs even without nested collections.
This violation of the assumption that iterating the collection returns elements of eltype can cause errors (or silently produce incorrect results??) in other libraries (e.g., ForwardDiff).
If this behavior is intentional, it should at least be documented. However, it might be a bug, and eltype should perhaps be Union{Float64, Int} instead or Real. Alternatively, the iteration (and getindex) could explicitly perform type conversion with convert.
@oscardssmith do you have thoughts on this?
I think the simplest solution would be to do promotion on access or setindex. not fully sure what the implications would be though
I think that's reasonable. Can you give it a shot?