AxisArrays.jl
AxisArrays.jl copied to clipboard
AxisArrays allows repeated values in axis
It seems that AxisArrays doesn't check that the names in an axis are unique. The MWE currently works and it seems it shouldn't
MWE
julia> ax1 = fill(randstring(10), 100);
julia> t = AxisArray(rand(100, 48), ax1, 1:48);
julia> t
2-dimensional AxisArray{Float64,2,...} with axes:
:row, ["YF263D1vZG", "YF263D1vZG", "YF263D1vZG", "YF263D1vZG", "YF263D1vZG", "YF263D1vZG", "YF263D1vZG", "YF263D1vZG", "YF263D1vZG", "YF263D1vZG" … "YF263D1vZG", "YF263D1vZG", "YF263D1vZG", "YF263D1vZG", "YF263D1vZG", "YF263D1vZG", "YF263D1vZG", "YF263D1vZG", "YF263D1vZG", "YF263D1vZG"]
:col, 1:48
And data, a 100×48 Array{Float64,2}:
0.544045 0.706665 0.794698 … 0.146636 0.173689 0.860602
0.818773 0.665337 0.598364 0.318264 0.779036 0.49364
0.506061 0.634103 0.429253 0.727027 0.280061 0.105964
0.188533 0.438404 0.205067 0.803538 0.469894 0.706383
0.232693 0.520553 0.731804 0.483673 0.0525813 0.735506
0.708654 0.495217 0.419302 … 0.187662 0.394126 0.286807
0.0296371 0.145384 0.67488 0.363728 0.976938 0.679723
0.736628 0.130968 0.583726 0.295144 0.465268 0.2539
0.537004 0.585194 0.204467 0.491539 0.428528 0.259942
0.884711 0.496042 0.0305943 0.0416132 0.279033 0.792419
⋮ ⋱ ⋮
0.161877 0.163169 0.645313 0.978483 0.168679 0.731583
0.0828598 0.774799 0.987582 0.466019 0.214213 0.757673
0.333942 0.919552 0.512247 0.420129 0.268359 0.412811
0.843079 0.416188 0.821125 0.0913968 0.298315 0.681747
0.137244 0.171126 0.953037 … 0.272229 0.2507 0.822746
0.0817602 0.75145 0.767151 0.988747 0.458262 0.584586
0.674519 0.518633 0.91036 0.255896 0.724942 0.637565
0.360949 0.11814 0.974368 0.24273 0.957803 0.365758
0.497645 0.396735 0.0457641 0.511115 0.099716 0.0692743
it seems it shouldn't
Why should this be disallowed? If e.g. hcat is going to work, then it will inevitably sometimes produce duplicates. Lookup works by findfirst I think:
julia> t = AxisArray(rand(Int8, 3, 4), ['a', 'a', 'b'], 0:3)
2-dimensional AxisArray{Int8,2,...} with axes:
:row, ['a', 'a', 'b']
:col, 0:3
And data, a 3×4 Matrix{Int8}:
60 -4 -80 -17
-92 -8 -42 93
19 -41 -106 -72
julia> t['a',4]
-17
it seems it shouldn't
Why should this be disallowed? If e.g.
hcatis going to work, then it will inevitably sometimes produce duplicates. Lookup works byfindfirstI think:julia> t = AxisArray(rand(Int8, 3, 4), ['a', 'a', 'b'], 0:3) 2-dimensional AxisArray{Int8,2,...} with axes: :row, ['a', 'a', 'b'] :col, 0:3 And data, a 3×4 Matrix{Int8}: 60 -4 -80 -17 -92 -8 -42 93 19 -41 -106 -72 julia> t['a',4] -17
I think that's the problem -- AxisArrays allows you to look up keys that aren't unique, which (I assume) very often indicates a bug, rather than being the user's actual intention.
which (I assume) very often
But just asserting this doesn't answer the question. Why shouldn't the labels attached to axes be categories or dates or some other information, which may have duplicates?
If you want to ensure they are unique, you can call unique. But if that were built in, then it would be hard to avoid.