Missings.jl
Missings.jl copied to clipboard
similar behavior
DataArrays and CategoricalArrays special case similar
by initializing the values with missing
, but default behavior in base is to initialize only the array, leaving #undef
. How do we want to handle this?
julia> T = Union{Int, Missing}
Union{Int64, Missings.Missing}
julia> similar(CategoricalVector{T}(3))
3-element CategoricalArrays.CategoricalArray{Union{Int64, Missings.Missing},1,UInt32}:
missing
missing
missing
julia> similar(DataArray(T, 3))
3-element DataArrays.DataArray{Int64,1}:
missing
missing
missing
julia> similar(missings(T, 3))
3-element Array{Union{Int64, Missings.Missing},1}:
#undef
#undef
#undef
Currently, on 0.7, this is the case for isbits
types, but not for other types, which makes it relatively complex to grasp:
julia> similar(missings(Int, 3))
3-element Array{Union{Missings.Missing, Int64},1}:
missing
missing
missing
julia> similar(missings(String, 3))
3-element Array{Union{Missings.Missing, String},1}:
#undef
#undef
#undef
I think it makes more sense to fill all arrays with missing
, which could allow us to get rid of missings
once we have a short syntax for Union{T, Missing}
. After discussing this with @StefanKarpinski, a possible general rule would be to always fill uninitialized arrays with the first singleton type (according to an internal order which doesn't correspond to what the user types).
Waiting for this (which should be discussed in Base), I'd say we should keep DataArrays and CategoricalArrays as they are. But indeed the inconsistency isn't great.
See https://github.com/JuliaLang/julia/issues/24939 about array constructors in Base.