AxisArrays.jl WIP: Access the values of an axis with getproperty

WIP: Access the values of an axis with getproperty

Open c42f opened this issue 5 years ago • 22 comments

Here's one possible use for getproperty with an AxisArray. I'm not yet sure this is a good idea (being an extremely new user of AxisArrays), but it seems rather convenient. [Edit: After a few days of using this I'd say that it's extremely convenient.]

A small demo in a signal processing-like context:

julia> using Unitful, AxisArrays

julia> A = AxisArray(rand(10,2), Axis{:time}((0:9)u"ms"), Axis{:channel}(1:2))
10×2 AxisArray{Float64,2,Array{Float64,2},Tuple{Axis{:time,StepRange{Quantity{Int64,𝐓,Unitful.FreeUnits{(ms,),𝐓,nothing}},Quantity{Int64,𝐓,Unitful.FreeUnits{(ms,),𝐓,nothing}}}},Axis{:channel,UnitRange{Int64}}}}:
 0.891098  0.904336
 0.291428  0.125264
 0.116863  0.639308
 0.217566  0.972434
 0.656042  0.517365
 0.758901  0.723122
 0.856255  0.77854 
 0.671771  0.695505
 0.793823  0.86887 
 0.302142  0.837619

julia> A.time
0 ms:1 ms:9 ms

julia> A.channel
1:2

julia> plot(ustrip.(A.time), A)
# ...

Dec 05 '18 07:12 c42f

I do think it's very important to have a better way to access the values of an axis, and this seems as good as any.

Dec 05 '18 19:12 iamed2

Cool, I've added

A little something to the docs
tests, particularly that this is fully inferred when a const name is supplied
propertynames so that tab completion will work in the REPL.

I'm not sure whether propertynames should include :data and :axes? I left these out for now as I assume data and axes field names are not part of the public API. (But is there an approved way to unwrap things to get at data?)

Dec 06 '18 00:12 c42f

CI failures seem to be an unrelated problem in a different test (only on julia master)

Dec 06 '18 02:12 c42f

I would implement propertynames such that private=true shows the data and axes fields.

There is currently no public interface to get data.

Inferred tests are really good! I would test that data and axes are still inferred as well.

Dec 06 '18 16:12 iamed2

Oh, excellent point, I didn't know about that the private flag to propertynames. Fixed now + extra tests added.

Dec 07 '18 03:12 c42f

Documenter CI failures should be fixed by #153

Dec 11 '18 02:12 c42f

Test failure on nightly is addressed by https://github.com/JuliaArrays/OffsetArrays.jl/pull/62.

I really like this---I'd even favor it as a way of resolving the issue of renaming AxisArrays.axes. Should we deprecate that as part of this change?

But there's one niggling issue: do you return the range values or the corresponding Axis object? I'd actually favor the latter because of this possible construction:

for i in eachindex(A.time)
    timeslice = @view A[A.time(i)]
    ...
end

Dec 11 '18 13:12 timholy

But there's one niggling issue: do you return the range values or the corresponding Axis object? I'd actually favor the latter because of this possible construction:

I think then we need a public interface for getting the values of an axis. But perhaps now that we have the possibility of deprecating field access with getproperty, it's okay to recommend ax.val?

I really like this---I'd even favor it as a way of resolving the issue of renaming AxisArrays.axes. Should we deprecate that as part of this change?

At the very least the multi-arg version should be deprecated IMO

Dec 11 '18 16:12 iamed2

But there's one niggling issue: do you return the range values or the corresponding Axis object

Agreed, good question. The reason I didn't do this was because the user already knows (syntactically) which axis name they're accessing, so it seemed redundant to return the wrapped value.

So far in practical use this has been the right decision in most cases, but occasionally I've needed to re-wrap the axis which is annoying. One solution would be to make the Axis itself an AbstractArray; then the need to access val directly would be unnecessary. This also feels semantically sound.

However as the package is currently implemented, Axis is also used to hold scalars and other things. Perhaps it could be made to work though.

Dec 12 '18 00:12 c42f

At the very least the multi-arg version should be deprecated IMO

Perhaps we should do this. I'm a bit cautious about getproperty as a general interface because it's kind of strangely non-extensible — there can only be one getproperty per concrete type. Therefore it's great for writing code when you already know the concrete type, but can be rather bad for generic situations.

So for this reason I'm not sure getproperty is a solution to the axes conundrum. And perhaps it's also a sign that it should return the concrete val as well.

Would it make more sense just merge Base.axes and AxisArrays.axes together and make axes(::AxisArray, d) return an AxisUnitRange (which behaves like a one dimensional AxisArray, but is a subtype of AbstractUnitRange (the docs say axes must return AbstractUnitRanges))?

Dec 12 '18 05:12 c42f

So thinking about the Base.axes conundrum, I tried making an AxisUnitRange which is returned from Base.axes(A,i) for some A::AxisArray. It wraps together a combination of A.axes[i] and Base.axes(A.data)[i] and this seemed to work.

But then I wondered whether Axis itself should be this type, in which case we can merge the functionality of Base.axes and AxisArrays.axes entirely.

This would be a breaking change of course, because the Axis would then act as an AbstractUnitRange where indexing this range provides normal indices into the parent AxisArray rather than the "values along the axis" like it currently does. However, this is kind of consistent and symmetrical with the behavior of AxisArray itself.

We could just have a function axisdata / axisvalues or some such which extracts what is currently the .val field from the axis. With the current definitions this would be simply

axisvalues(ax::Axis) = ax.val
axisvalues(a::AxisArray, i) = a.axes[i]

@andyferris I know you've been thinking hard about things very much like this, I'd appreciate your thoughts!

Dec 12 '18 07:12 c42f

Merging the two is hard, see #81. Fundamentally the problem is that we're pretty committed to the notion that AbstractArrays get indexed by integers, and that axes returns a cartesian product of ranges with entries that can be used to index the array. In Interpolations.jl we've switched to A(x, y, z) for "indexing"-with-interpolation and are contemplating giving A[i, j, k] and A(x, y, z) completely different behaviors. Here that's less viable because people might want to index with a mixture of "counting" and "value" indexes.

Dec 12 '18 11:12 timholy

I have a swirling of thoughts in this space. Sorry, they are a bit disorganized.

Have we considered A.axes.time and A.axes.channel? We could make this struct element a named tuple instead of a tuple. It feels clear upon reading what this must mean, and is still reasonably short. Since AxisArray is a concrete type not an interface, you could safely replace AxisArrays.axes with a simple getfield/getproperty.
I feel that named axes and axis values are two separate concerns. Tackling both at once might be more challenging than composing the ideas?
I don't see why named axes couldn't eventually be solved with Base.axes returning a named tuple. There's probably some carnage to solve to make that work smoothly... but it feels in the realm of possibility. It would also be nice if getindex syntax supported "keyword" style syntax like A[time = :, channel = 1].
I feel that maybe axis values is really trying to encapsulate the concept of a multidimensional dictionary. We should maybe try to support those more directly? And have a super convenient way to get the array backing the multi-dimensional dictionary? (I have been prototyping this idea privately).
I have been considering that there might be benefits to having a global overload to getproperty(::AbstractArray, ::Symbol) that lets you treat every array as a table/dataframe. If A was an array of complex numbers, then A.im would be the imaginary component. With named tuples you have a pretty nice array interface for tables "out of the box" - that is, we make Julia both a "linear algebra" and a "tabular data" super-machine, to broaden our appeal to the data audience. This is certainly an "out there" idea that may be controversial so don't let this particular thought bubble slow you down here ;) (I mention this only because this is the first time I've seen an AbstractArray consider use getproperty for anything other than getting table columns).

and that axes returns a cartesian product of ranges with entries that can be used to index the array.

Technically, axes returns a list of unit ranges, keys returns a Cartesian product (or simple range). I'm thinking that the transformation from axes to keys, i.e. keys(A) == CartesianIndices(axes(A)...) works fine if axes(A) is a tuple or a named tuple. There could similarly be a AxisUnitRange <: AbstractUnitRange which is both a unit range and a 1D axis array at the same time. Does this make sense Tim, or am I crazy? (EDIT: I think I am a bit crazy).

Dec 12 '18 12:12 andyferris

I'm thinking that the transformation from axes to keys, i.e. keys(A) == CartesianIndices(axes(A)...) works fine if axes(A) is a tuple or a named tuple.

My only point was that if axes(v) returns something along the lines of (0.1:0.1:1.0,) then CartesianIndices(axes(v)) is going to fail since the values are not integers. And currently we store such ranges as the AxisArrays.axes(v). This is why I wonder if axes(v) and v.axes might return different things.

Dec 12 '18 15:12 timholy

Thanks Tim, it looks like @mbauman has already done versions of

"turn Axis into an AbstractUnitRange" (#58) and
"Create a wrapper for both the range and the Axis" (#81, in particular 218b5c51)

Having thought about it a bit more it does seem that we need the latter as a thing to return from Base.axes if it can be made to work. For example you want to refer to the Axis values independently of the indices of any array it might happen to index.

if axes(v) returns something along the lines of (0.1:0.1:1.0,) then CartesianIndices(axes(v)) is going to fail since the values are not integers

Yes, I was trying to suggest that the return of Base.axes(::AxisArray) would act like it currently does when indexed, but also wrap the Axis so you can get at that as well, if necessary. Very much like 218b5c51 if I'm reading it right.

Dec 12 '18 20:12 c42f

Right. #81 looks like great work.

I'm in favour of using keys and axes to control the behaviour of things like similar. For example, if Base always used axes instead of size I think StaticArrays would be significantly easier to implement (for making nested containers like Diagonal(SVector(1,2,3)) behave like a statically-sized array). Similarly it would be perfect if Diagonal(::AxisVector) would behave similarly to an AxisMatrix.

You still need a convenient way of accessing axis values, of course (hence this PR I suppose). So yes, axes(A) and A.axes might return different things.

Dec 12 '18 22:12 andyferris

Just be aware that #58 could kill this (concept: I collected images with non-uniform sampling time):

julia> import AxisArrays
                                                                                                                                                                                                                                                                               
julia> using AxisArrays: AxisArray, Axis
                                                                                                                                                                                                                                                                               
julia> img0 = rand(100, 100, 8);
                                                                                                                                                                                                                                                                               
julia> img = AxisArray(img0, Axis{:y}(1:100), Axis{:x}(1:100), Axis{:time}(sort(rand(8))));
                                                                                                                                                                                                                                                                 
julia> AxisArrays.axes(img)
(Axis{:y,UnitRange{Int64}}(1:100), Axis{:x,UnitRange{Int64}}(1:100), Axis{:time,Array{Float64,1}}([0.0157872, 0.252618, 0.284213, 0.452175, 0.806429, 0.809029, 0.850077, 0.960828]))

(And yes, we're actually doing that in practice these days.)

Dang we need traits.

Dec 12 '18 22:12 timholy

I didn't follow - what does #58 kill?

(And yes, we're actually doing that in practice these days.)

I can really see why you need something like AxisArrays to keep track of the relevant data.

A couple of quick questions (since I'm thinking of how to implement multi-dimensional dictionaries) - do you find you often need to access the time axis by integer index? Do you often index it by Float64 time? Do you mostly iterate through this axis? Or is this structure mostly used as a basis for interpolation?

Dang we need traits.

Dec 12 '18 23:12 andyferris

Sorry, what I meant was that if Axis <: AbstractUnitRange then I think you want to make sure that step(::Axis) is defined, but of course if it's storing a Vector then there may be no step.

Dec 12 '18 23:12 timholy

A couple of quick questions (since I'm thinking of how to implement multi-dimensional dictionaries) - do you find you often need to access the time axis by integer index? Do you often index it by Float64 time? Do you mostly iterate through this axis? Or is this structure mostly used as a basis for interpolation?

Most often we iterate over the time axis. Rarely indexed by Float64 time to find a particular slice, but increasingly we use such information for interpolation. There are other ways to tackle many of these things, but one concern is that if we narrow what an AxisArray can store in exchange for other conveniences, we might need an AlmostAxisArrays package. Which could be vaguely annoying.

Dec 12 '18 23:12 timholy

Right! This is precisely why I’m interested in dictionaries.

A dictionary is a container where lookup from key to value is fast. What you wrote is basically a multidimensional sort-based dictionary; there is a logarithmic time algorithm to find a given time slice. When there is a uniform step, this cost is reduced to O(1). When it is a unit range, the cost is basically eliminated (you may need to subtract an offset).

Some axes might be categorical and be suited to a hash-based approach. Others might be small and brute-force lookup may be best (or else be “static” as in StaticArrays).

A multidimensional dictionary should seemlessly handle all the cases from a dense ND array to a hash map with full efficiently.

Dec 12 '18 23:12 andyferris

This has always seemed super clunky to me

for i in AxisArrays.axes(img, Axis{:x}())
    ...
end

I collected images with non-uniform sampling time

We do this a lot as well. For me, the flexibility to have non-uniform sampling is a large advantage of AxisArrays` over Base.

Aug 30 '19 00:08 tlnagy

AxisArrays.jl AxisArrays.jl copied to clipboard

WIP: Access the values of an axis with getproperty

AxisArrays.jl
AxisArrays.jl copied to clipboard