OffsetArrays.jl icon indicating copy to clipboard operation
OffsetArrays.jl copied to clipboard

[RFC] when should Colon `:` keep offset information

Open johnnychen94 opened this issue 4 years ago • 3 comments

In a normal 1-based indexing array world, this is clear: : serves as a length placeholder.

During #220 #228 and #245, I've realized that we haven't yet had a clear and consistent definition of the role of : in the OffsetArray world. Let's take reshape as an example, it might also apply to all other operations where : is allowed, e.g., getindex, setindex!.

I propose the rule of thumb is to keep offset information if it's unambiguous. It comes with one and only one extra rule: if all inds inputs are range type, keep offset information for the corresponding dimension where : is placed at.

A = OffsetArray(rand(4, 4, 4), -1, -2, -3)

reshape(A, :) # (0:64, )

reshape(A, 1:8, :) # (1:8, -1:6)
reshape(A, :, 1:8) # (0:7, 1:8)

reshape(A, 1:8, 1:8, :) # (1:8, 1:8, -2:-2)
reshape(A, 1:8, :, 1:8) # (1:8, -1:-1, 1:8)
reshape(A, :, 1:8, 1:8) # (0:0, 1:8, 1:8)

reshape(A, 1:8, 1:2, :) # (1:8, 1:2, -2:1)
reshape(A, 1:8, :, 1:2) # (1:8, -1:2, 1:2)
reshape(A, :, 1:8, 1:2) # (0:3, 1:8, 1:2)

All other cases should be consistent with the Base case. For example:

reshape(A, 8, :) # (1:8, 1:8)

In this case, it's ambiguous whether : is used as a length placeholder or axes placeholder so we should stick to the Base case; otherwise, I can foresee a lot of type piracy involved.

johnnychen94 avatar Jun 16 '21 08:06 johnnychen94

The present behaviour of making : always start from 1 struck me as sensible. Here are some examples where the proposed rule (if I understood right) would give offsets that seem counter-intuitive:

M = reshape(1.0:16, 0:3, 3:6) .+ 0.0

reshape(M, 0:7, :)  # does this 2nd dimension share an identity with the 3:6 one?

reshape(M, 0:3, 1, :)  # ... yet this 3rd dimension (with the same length) does not?

reshape(M, axes(M,1), :, axes(M,2))  # here I'm inserting a trivial dimension

BTW TensorCast now handles offsets, and some kinds of reshaping. My 2nd and 3rd examples would be written like this, it knows that the last index of B was the last index of M, and thus it will preserve its axis:

@cast B[i,_,j] := M[i,j]

The closest you can get to my 1st example above is @cast A[(i,j),k] := M[i,(j,k)] k in 1:2 but this is more like reshape(M, :, OneTo(2)): the combined index (i,j) always starts at 1 (as does the trivial index created by _).

mcabbott avatar Jul 15 '21 14:07 mcabbott

This might be relevant:

reshaping is essentially an appeal to the linear index representation, and that differs for 1-dimensional objects (for which the linear index can start anywhere) and higher-dimensional objects (for which linear indexing always starts at 1).

In this sense reshaping for arrays should always use linear indices of the parent, and we view the reshape methods defined here as re-indexing the reshaped array (and unrelated to the Cartesian indices of the parent array). If ranges of indices are specified as arguments to reshape, these become the indices along the corresponding axes of the reshaped array. A colon in that case implies that we retain the axes of the reshaped array along that dimension. This is also consistent with what the OffsetArray constructor does, where a colon retains the axis of the parent array along the corresponding dimension.

jishnub avatar Jul 17 '21 09:07 jishnub

Agreed with @jishnub:

  • starts with 1 if there is more than one remaining axis in the parent array (that's the linearindices rule for multidimensional arrays)
  • starts with 1 if the child cannot have same axis as the parent (e.g., like reshaping a 2x6 into a 3x4 with : in the second slot)
  • adopts the parent axis otherwise.

That last bit could merit a bit of thinking during actual implementation wrt what types get created.

timholy avatar Jul 18 '21 09:07 timholy