CategoricalArrays.jl
CategoricalArrays.jl copied to clipboard
Add `Vector` conversion
Currently there is an inconsistency between Array and Vector conversions for categorical arrays
julia> x = CategoricalArray([1, 1, 1, 2, 2, 3]);
julia> Array(x)
6-element Array{Int64,1}:
1
1
1
2
2
3
julia> Vector(x)
6-element Array{CategoricalValue{Int64,UInt32},1}:
1
1
1
2
2
3
As mentioned on slack, a better solution overall would be to make Vector return a CategoricalArray and have an explicit uncategorize method.
Vector may not return CategoricalArray, as Vector has a very concrete meaning in Base (it should call a constructor of Vector that should allocate a fresh Vector as opposed to e.g. convert functions).
Good catch. These indeed have to return an Array, but we should decide whether to return an Array{Int} or an Array {<:CategoricalValue{Int}}. This is related to whether we keep the current similar and collect overloads, which ensure Array{<:CategoricalValue} is never produced: if we drop them, we could return Array{<:CategoricalValue{Int}}, which is somewhat more logical than Array{Int}.
EDIT: though reading the Slack thread it seems that Array{Int} is what was expected (in that case at least)
It was expected for consistency with Array. However, I think that in these cases we could allow some flexibility and change the behaviour if it helps in other areas.
if we drop them, we could return
Array{<:CategoricalValue{Int}}, which is somewhat more logical thanArray{Int}.
I think most people converting to a Array are just saying "I want out! give me numbers again!". I think Array{<:CategoricalValue{Int}} is fine as long as we give people an explicit decategorize command.
broadcasted get now almost does it (except that it fails on missing)
Do we have a plan for get to work with missing?
I just had a frustrating experience with factors in R and thought Julia would be nicer, but this is still annoying.
passmissing(get)?
Ah that does work. Fair enough!
I don't really understand the choice of get to be honest, since a categorical value isn't a collection. If we made our own function we could define it for missing. But I appreciate the need to get to 1.0 and it's not that big a deal.
Actually get is also my long-standing gripe https://github.com/JuliaData/CategoricalArrays.jl/issues/142. I would change it if @nalimilan supported this (and then passmissing would not be needed).
AFAICT we agreed on using unwrap at #142. We just need to move the definition from Tables.jl to DataAPI.jl.
@quinnj - would you have time to make this move? Then we can update CategoricalArrays.jl
PR up: https://github.com/JuliaData/DataAPI.jl/pull/35. So I realized we don't really need to move anything from Tables.jl; the definitions there are...not really related and not really necessary. Like, they're not useful generically. So we just define the single unwrap(x) = x definition in DataAPI.jl that CategoricalArrays can overload.