AxisArrays.jl icon indicating copy to clipboard operation
AxisArrays.jl copied to clipboard

AxisArrays allows repeated values in axis

Open jd-lara opened this issue 4 years ago • 3 comments

It seems that AxisArrays doesn't check that the names in an axis are unique. The MWE currently works and it seems it shouldn't

MWE

julia> ax1 = fill(randstring(10), 100);
julia> t = AxisArray(rand(100, 48), ax1, 1:48);
julia> t
2-dimensional AxisArray{Float64,2,...} with axes:
    :row, ["YF263D1vZG", "YF263D1vZG", "YF263D1vZG", "YF263D1vZG", "YF263D1vZG", "YF263D1vZG", "YF263D1vZG", "YF263D1vZG", "YF263D1vZG", "YF263D1vZG"  …  "YF263D1vZG", "YF263D1vZG", "YF263D1vZG", "YF263D1vZG", "YF263D1vZG", "YF263D1vZG", "YF263D1vZG", "YF263D1vZG", "YF263D1vZG", "YF263D1vZG"]
    :col, 1:48
And data, a 100×48 Array{Float64,2}:
 0.544045   0.706665  0.794698   …  0.146636   0.173689   0.860602
 0.818773   0.665337  0.598364      0.318264   0.779036   0.49364
 0.506061   0.634103  0.429253      0.727027   0.280061   0.105964
 0.188533   0.438404  0.205067      0.803538   0.469894   0.706383
 0.232693   0.520553  0.731804      0.483673   0.0525813  0.735506
 0.708654   0.495217  0.419302   …  0.187662   0.394126   0.286807
 0.0296371  0.145384  0.67488       0.363728   0.976938   0.679723
 0.736628   0.130968  0.583726      0.295144   0.465268   0.2539
 0.537004   0.585194  0.204467      0.491539   0.428528   0.259942
 0.884711   0.496042  0.0305943     0.0416132  0.279033   0.792419
 ⋮                               ⋱  ⋮                     
 0.161877   0.163169  0.645313      0.978483   0.168679   0.731583
 0.0828598  0.774799  0.987582      0.466019   0.214213   0.757673
 0.333942   0.919552  0.512247      0.420129   0.268359   0.412811
 0.843079   0.416188  0.821125      0.0913968  0.298315   0.681747
 0.137244   0.171126  0.953037   …  0.272229   0.2507     0.822746
 0.0817602  0.75145   0.767151      0.988747   0.458262   0.584586
 0.674519   0.518633  0.91036       0.255896   0.724942   0.637565
 0.360949   0.11814   0.974368      0.24273    0.957803   0.365758
 0.497645   0.396735  0.0457641     0.511115   0.099716   0.0692743

jd-lara avatar Feb 17 '21 18:02 jd-lara

it seems it shouldn't

Why should this be disallowed? If e.g. hcat is going to work, then it will inevitably sometimes produce duplicates. Lookup works by findfirst I think:

julia> t = AxisArray(rand(Int8, 3, 4), ['a', 'a', 'b'], 0:3)
2-dimensional AxisArray{Int8,2,...} with axes:
    :row, ['a', 'a', 'b']
    :col, 0:3
And data, a 3×4 Matrix{Int8}:
  60   -4   -80  -17
 -92   -8   -42   93
  19  -41  -106  -72

julia> t['a',4]
-17

mcabbott avatar Aug 29 '21 16:08 mcabbott

it seems it shouldn't

Why should this be disallowed? If e.g. hcat is going to work, then it will inevitably sometimes produce duplicates. Lookup works by findfirst I think:

julia> t = AxisArray(rand(Int8, 3, 4), ['a', 'a', 'b'], 0:3)
2-dimensional AxisArray{Int8,2,...} with axes:
    :row, ['a', 'a', 'b']
    :col, 0:3
And data, a 3×4 Matrix{Int8}:
  60   -4   -80  -17
 -92   -8   -42   93
  19  -41  -106  -72

julia> t['a',4]
-17

I think that's the problem -- AxisArrays allows you to look up keys that aren't unique, which (I assume) very often indicates a bug, rather than being the user's actual intention.

ParadaCarleton avatar Aug 29 '21 18:08 ParadaCarleton

which (I assume) very often

But just asserting this doesn't answer the question. Why shouldn't the labels attached to axes be categories or dates or some other information, which may have duplicates?

If you want to ensure they are unique, you can call unique. But if that were built in, then it would be hard to avoid.

mcabbott avatar Aug 29 '21 19:08 mcabbott