Low-latency ComponentArrays
Hi, I really enjoy using ComponentArrays. I used it as the state vector in OrdinaryDiffEq. One issue that we ran into was that the size of our components change for each simulation, leading to having to recompile a lot of code each time. The issue can be summarized as such:
julia> a = ComponentVector(; a=[1], b=[2,3])
julia> b = ComponentVector(; a=[1,2], b=[3])
julia> typeof(a) == typeof(b)
false
This is because the Axes keys and values are both in the type domain. I was recently discussing this on Slack with @MasonProtter, @SouthEndMusic and @ChrisRackauckas.
I made a little prototype struct CArray as a possible replacement of the current ComponentArray and I want some early feedback to see if folks would be interested in this, or point out flaws in this design.
I haven't focused yet on matching the API, but do already make sure that range, integer and nested components all work. The hope is that this can be mostly compatible, but probably still breaking. Some quick possibly flawed benchmarks show similar performance.
struct CArray{T, N, A<:DenseArray{T,N}, NT} <: DenseArray{T, N}
data::A
axes::NT
end
The full prototype is here: https://gist.github.com/visr/dde7ab3999591637451341e1c1166533
Yeah, I think this is a good idea. I prototyped something like this a while back when I was first writing ComponentArrays and decided against it because microbenchmarks were slightly slower and I wanted this to truly be a zero-cost abstraction. As I've come to use it more, though, I've realized that that was a mistake and it's prevented both resizable inner arrays and, maybe even more importantly, type-stable construction when there are array components. If this prototype solves those problems and is only a little slower, you'd have my vote.
I think there's a not-so-difficult way to implement this. The core type is struct Axis{IdxMap} <: AbstractAxis{IdxMap} end, basically IdxMap is all of the information for the indexing. What you want to do is create a:
abstract type AbstractRuntimeAxis <: AbstractAxis{Nothing} end # Maybe? Or Maybe make new highest level abstract type
RuntimeAxis{T} <: AbstractRuntimeAxis
idxmap::T
end
and once you have that, you just adjust: https://github.com/SciML/ComponentArrays.jl/blob/56f1fe806667b1284cbd84e8dd7932083f97ebf1/src/axis.jl#L3
@inline indexmap(x::RuntimeAxis) = x.idxmap
Now RuntimeAxis should be able to be used where Axis was, but it's using the idxmap as runtime information instead of as new type information. Then we just need to figure out what constructor we want, RTComponentArray, or ComponentArray{false} or something, to choose to use a RuntimeAxis instead.
You'll want to handle a few other axis types in the same way, like the views, and then make sure runtime stuff makes runtime stuff, and then the whole package should basically just automatically carry over to this form since it seems to use indexmap everywhere, so it's generic to the axis type that is used already (because it needs to for the view axis stuff to work).
You can even then make a version where that idxmap is vectors instead of tuples if you want the index map to be completely mutable and completely runtime. That would incur more overhead than the named tuple version. The NT version is probably the most useful case here, but it's something to consider as it would be the zero compile time specialization case.
Yeah probably that would be the easiest way to get this in as an additional opt-in feature.
If I suppress any output printing to avoid stackoverflow error I can construct a ComponentArray that doesn't include the axis values in the type:
ax = RuntimeAxis((; a = 1:1, b = 2:3))
ca = ComponentArray([1.0, 2.0, 3.0], ax);
typeof(ca)
# ComponentArrays.LazyArray{Float64, 1, Base.Generator{Vector{Float64}, ComponentArrays.var"#18#19"{Tuple{RuntimeAxis{@NamedTuple{a::UnitRange{Int64}, b::UnitRange{Int64}}}}}}}
I cannot really get any components out yet, everything gets wrapped in ComponentArrays.LazyArray.
My feeling is that if we'd switch to something like the gist is that we can significantly simplify the ComponentArrays code, at the cost of some breaking changes. Though I don't know enough about the story behind ShapedAxis, PartitionedAxis, ViewAxis, CombinedAxis if all that is really needed.
Note by the way that in the prototype I don't restrict the axis to a particular type, so in this version anything that supports this would work:
loc = getproperty(axes, name)
component(data, loc)
My feeling is that if we'd switch to something like the gist is that we can significantly simplify the ComponentArrays code, at the cost of some breaking changes. Though I don't know enough about the story behind ShapedAxis, PartitionedAxis, ViewAxis, CombinedAxis if all that is really needed.
I think that stuff is pretty needed for all of the reshape and odd cases to work out. There's a lot of tests though so if you could get them to pass then good on you. But I think that's fighting a devil: it was added for a reason and it's complex for a reason so I'd just go with it.