DimensionalData.jl icon indicating copy to clipboard operation
DimensionalData.jl copied to clipboard

Extending Base.stack for DimArrays

Open brendanjohnharris opened this issue 1 year ago • 8 comments

Adds methods for Base.stack, and related non-exported functions from Base, that are compatible with DimArrays.

Syntax follows Base: stacking DimArrays along a given axis dims creates a new dimension. However, existing dimension data is preserved, and the new dimension becomes an AnonDim.

Optionally, a Dimension dim can be provided as the first argument to stack, in which case the new dimension is assigned as dim rather than AnonDim.

brendanjohnharris avatar Feb 23 '24 06:02 brendanjohnharris

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 83.99%. Comparing base (9846bfe) to head (884ab68). Report is 41 commits behind head on main.

:exclamation: Current head 884ab68 differs from pull request most recent head 2c7b8b9

Please upload reports for the commit 2c7b8b9 to get more accurate results.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #645      +/-   ##
==========================================
+ Coverage   83.83%   83.99%   +0.15%     
==========================================
  Files          45       45              
  Lines        4102     4136      +34     
==========================================
+ Hits         3439     3474      +35     
+ Misses        663      662       -1     

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov-commenter avatar Feb 23 '24 06:02 codecov-commenter

Thanks for the suggestions! Went through and cleaned up the PR.

The Base.stack(dim, X; dims) method lets you supply a Dimension to assign to the newly created array dimension; not strictly necessary, but more convenient than setting or rebuilding the resulting array to overwrite the new AnonDim (is an AnonDim a suitable way to treat a newly created dimension, as I've done here?).

Let me know if you have more thoughts on this, happy to work on it further :)

brendanjohnharris avatar Feb 27 '24 05:02 brendanjohnharris

We try not to change base method signatures besides allowing dims to be named. To keep the mental model very simple.

I would just put that new Dimension in dims like cat does. If its a constructed dimension it replaces the existing one. Otherwise Type/Symbol etc just choose them. Just check d isa Dimension

You will also need to format(newdims, A) afterwards.

rafaqz avatar Feb 27 '24 07:02 rafaqz

Base.stack behaves differently than Base.cat in that it always creates a new dimension (i.e. Base.stack([A, B,...]; dims=2) inserts a dimension at position 2, increasing ndims(A) by 1); in general this new dimension is unrelated to the existing array dimensions, so we need to specify both position (dims) and a Dimension (dim).

I see your point about keeping new signatures to a minimum, though. How about allowing the keyword argument dims to be assigned as an 'Integer=>Dimension' pair, with the following behavior:

  1. If dims isa Integer: The new dimension is an AnonDim at position dims
  2. If dims isa Dimension: The new dimension is a DImension dims at position ndims(A)+1
  3. If dims isa Pair{Integer, Dimension}: The new dimension is a Dimension last(dims) at position first(dims)

We could also, instead, have that the new dimension is always an AnonDim, leaving it up to the user to set that dimension at a later time; in that case, however, there is ambiguity if there are any existing AnonDims in the input arrays.

brendanjohnharris avatar Feb 28 '24 03:02 brendanjohnharris

Using a Pair sounds like a good solution (as do points 1 and 2 as simpler cases). For 2 I guess putting it as the last dimension would be the majority use-case? (I rarely use stack)

Always AnonDim will require using set all the time with AnonDim => somedim. We may as well just put that pair in stack and skip the step.

rafaqz avatar Feb 28 '24 21:02 rafaqz