sortkeys() changes key container type
I think it is reasonable to expect that sorting keys should not change the key container type. This is not currently the case:
using AxisKeys, UniqueVectors
a = wrapdims(rand(2), UniqueVector, x=1:2)
println(typeof(axiskeys(a,:x)))
a = sortkeys(a)
println(typeof(axiskeys(a,:x)))
Produces:
UniqueVector{Int64}
Array{Int64,1}
That would be nice to have, but seems tricky to ensure. It does this:
a = wrapdims(rand(2), UniqueVector, x=1:2)
perm = sortperm(a.x)
wrapdims(parent(a)[perm], x=a.x[perm])
and a.x[perm] doesn't know that this is a permutation -- getindex must equally expect e.g. a.x[[1,1,2]].
It might be possible to call sort(a.x) again, and trust that this will produce the same order? UniqueVectors can (and probably should) overload this to preserve the type.
UniqueVectors also produce an Array after sort(), but the in-place sort!() works. So this should do it for this container:
perm = sortperm(a.x)
newkeys = copy(a.x)
sort!(newkeys)
wrapdims(parent(a)[perm], x=newkeys)
Not sure how general this is for other containers, or if sort! and sortperm is guaranteed to produce the same order.
I get an error when I try sort!:
u = UniqueVector([41, 46, 19, 47, 21, 27, 16, 25, 45])
findfirst(isequal(u[5]), u) # fast method
su = sort!(u) # ArgumentError: cannot set an element that exists elsewhere in UniqueVector
@which sort!(u) # generic one, from Base.Sort
But also, I don't think this can work in general, e.g.
b = wrapdims(rand(3), y = 'c':-1:'a')
sortkeys(b, dims=:y)
Indeed. UniqueVector seems to have even more fundamental problems, as views or indexing also change the key container type.
A possible solution is to define a function that enforces the key containers:
convert_kc(K::KeyedArray, container_type::Type=UniqueVector)::KeyedArray =
KeyedArray( NamedDimsArray( parent(parent(K)), dimnames(K)), tuple( [ container_type(x) for x in axiskeys(K) ]...) )
and then call it after each operation that may need accelerated access in its output:
K = convert_kc(sortkeys(K))
K = convert_kc([K1 ; K2])
...
This is not pretty. Maybe KeyedArray could have a static Bool type parameter that, if true, calls this key container conversion at the end of each AxisKeys function that generates a KeyedArray result?
Simple views behave well, e.g. @which findfirst(isequal(47), view(u, 2:7)). But for u[2:7] the cost of re-generating the lookup dictionary was felt to be too much, in https://github.com/garrison/UniqueVectors.jl/pull/9. Maybe you could do better, don't re-hash, just update the indices in the dictionary? Although u[inds] isn't always unique...
Base has a function permute! but no permute, which would be the perfect thing to overload here.
A difficulty with a boolean flag is that the function needed to reconstruct a given type isn't obvious from the type. It could cary around this function, though. Still adds a fair bit of complication.