ArrayInterface.jl icon indicating copy to clipboard operation
ArrayInterface.jl copied to clipboard

interface for rewrapping Array wrappers

Open rafaqz opened this issue 4 years ago • 11 comments

Some wrapper types like AxisArray, NamedDImsArray and DimArray (and probably LabelledArray?) need to be the "outside" wrapper for, e.g. keeping the axis index accurate and dispatch for getindex methods.

This issue just came up on discourse: https://discourse.julialang.org/t/multiple-inheritance-carry-over-members/58355/8

People kind of expect a DimArray to keep working after wrapping with other types. It could be useful to have a trait and a shared method here that allows rewrapping:

rewrap(f, A) and rewrap(f, A, I...) could be used in other array constructors to allow rewrapping where necessary.

isouterwrapper could indicate if an object is an outer wrapper type.

rafaqz avatar Apr 02 '21 23:04 rafaqz

I think this would be a band-aid for the real issue, which is that these arrays are all adding some sort of new trait/behavior but the trait is only evident if it is the top level. Similarly, we shouldn't try to make a StaticArray continually wrap other stuff to keep its size information statically known to other function.

Of course, the situation is different if the wrapper actually changes the layout of an array. For example, Diagonal or UpperTriangular.

Tokazama avatar Apr 03 '21 03:04 Tokazama

This is partially true, but in practice it will be very difficult for axis indexing to work correctly through multiple outer wrappers. Axis lookup wrappers like DimArray usually leave the wrapped object as-is, but add axis keys that match it's apparent axes - this is hard to deal with as an internal wrapper, where the outer axes are unkown, and easy to deal with as an external wrapper, where they are known. In this case most methods are just forwarded to the parent array except the axis indexing part.

Being wrapped by other kind of arrays is awkward in comparison. How can a DimArray handle being wrapped by a SubArray? what happens with a lookup object like X(Near(103.6)) (given that it is passed through at all) that returns an index out of bounds for the outer SubArray but not for the inner DimArray? it probably shouldn't be able to index into data outside of the SubArray. But I'm not sure how to do that in a way that scales to multiple wrappers.

In practice it is much easier to aggressively rewrap to the outside - e.g. look at broadcast for NamedDims/DimensionalData/AxisKeys.

I may be mistaken but maybe @mcabbott has experimented with these alternate strategies somewhere? From my own experience with DimensionalData I think that wrappers that do not affect the objects size, shape or values, but do depend on them, always being on the outside is the simplest option. There is only the caveat of needing some rule for working out what happens when two of these types collides (ie. unwrap DimArray if wrapped with a NamdDimArray - we don't need both).

I would of course be persuaded by a clean working solution that didn't need rewrapping to the outside! but I haven't seen one yet.

Edit: also, for transposes and similar, both ways have their issues, there are definitely some situations where being the outer wrapper is not easier. Traits that notify or changes in axis order and permutation could help here.

rafaqz avatar Apr 03 '21 04:04 rafaqz

Yes I experimented a bit. In fact AxisKeys.jl is one experiment really, at making its wrapper commute with that of NamedDims.jl -- it runs the tests in both orders. But the approach is just to write lots of methods, which is not going to scale well, let alone work for packages which don't know about each other.

It seems hard to know quite what re-wrapping should achieve in general. Broadcasting is easy in that Base defines the scope, likewise views. But making something like padding (from the discourse thread) "work" with array wrappers not designed for it seems messy. Regardless of which ends up on top, what are the rules? When an AxisArray has a StepRange do you extend it? And categorical? What happens to Hermitian or BiDiagonal types? By the time you've answered these, then it seems hard to say the wrapper and the padding don't know about each other anymore. If the shared knowledge lives in some traits package, it seems that this must grow to basically contain all the logic.

mcabbott avatar Apr 03 '21 05:04 mcabbott

Yes there would need to be fairly specific scope. Outside of Base, things like CuArray and DistributedArray (or is is DArray?) are a good example of an array type that should be rewrapped, as they mostly just change the location of the data.

Then, something like a PaddedView changes the array size, which will break the index, but rewrap would still be useful just to keep dimension names - it makes no difference to NamedDims.jl that the size has changed (and yes I have though about extending axis ranges a few times...)

Further in,Transpose etc reorganise the dimension order, which breaks everything unless you handle it. Things are starting to get hard here, but there may be more elegant ways of dealing with transposition generally. Currently we all just add methods manually to handle the axis reordering.

Then there is every other way that shape/size can be altered by a wrapper, that we just can't handle, and shouldn't rewrap.

It seems there is some structure to the situation that we could work with, if it's worth it. We have 1 trait for an array wrapper: resizes_axes. If the axes are permuted, permutes_axes. And if some are added/removed or broken in other ways changes_axes. etc.

rafaqz avatar Apr 03 '21 06:04 rafaqz

I have nearly complete solutions worked for padding/resizing/permuting/etc. The remaining issue has really been arrays with different memory backends, which is the point of #130 and #140.

Tokazama avatar Apr 03 '21 12:04 Tokazama

Ok I'm keen to see what you come up with! Although I'm still concerned that a lot of edge cases will mean we still need dispatch on AbstractDimArray for DimensionalData.jl and similar packages. which requires being the outer wrapper. There are a bunch of changes to Base Julia needed to change that.

rafaqz avatar Apr 03 '21 12:04 rafaqz

There are a bunch of changes to Base Julia needed to change that.

There are some methods that will have issues and we need to put together our own implementation, but that's manageable. Most of these are actually pretty straightforward. "src/indexing.jl" has a lot of that for getindex. There are definitely a few things that need to happen before we can make big strides on some of this. Right now I'm working on abstractions for layouts that incorporate a lot of the stuff that's been done to support LoopVectorization.jl

Tokazama avatar Apr 03 '21 14:04 Tokazama

I like your enthusiasm!! but these wrappers do a lot more than getindex... dimension names are used in all the base functions that have a dims argument.

  • Can those also be handled as an inner wrapper?
  • Will an adjacent axis index be concatenated during cat for an inner wrapper?
  • What about Base._mapreduce_dim reducing the wrapper dims?
  • ...

It would be great but it sounds like a lot of work to get all of this done. Absolutely we should aim high, and in the end a lot of that base code does needs to be fixed so these things are easier.

My proposal here is just a basic incremental improvement on what already exists, but it's also pretty easily achieved.

rafaqz avatar Apr 03 '21 15:04 rafaqz

I should probably write something like "The State of ArrayInterface" thing to answer this properly. I've been doing a pretty terrible job of making stuff known and approachable for the Julia community over the last year. I'm going to finish a PR and start working on that.

Tokazama avatar Apr 03 '21 15:04 Tokazama

That would be great, good to start getting more people on board and contributing, at least ideas and feedback.

rafaqz avatar Apr 03 '21 15:04 rafaqz

BTW #141 is the PR I was referring to, so hopefully I can get started on this soon

Tokazama avatar Apr 12 '21 11:04 Tokazama