Missings.jl icon indicating copy to clipboard operation
Missings.jl copied to clipboard

similar behavior

Open cjprybol opened this issue 6 years ago • 2 comments

DataArrays and CategoricalArrays special case similar by initializing the values with missing, but default behavior in base is to initialize only the array, leaving #undef. How do we want to handle this?

julia> T = Union{Int, Missing}
Union{Int64, Missings.Missing}

julia> similar(CategoricalVector{T}(3))
3-element CategoricalArrays.CategoricalArray{Union{Int64, Missings.Missing},1,UInt32}:
 missing
 missing
 missing

julia> similar(DataArray(T, 3))
3-element DataArrays.DataArray{Int64,1}:
 missing
 missing
 missing

julia> similar(missings(T, 3))
3-element Array{Union{Int64, Missings.Missing},1}:
 #undef
 #undef
 #undef

cjprybol avatar Dec 05 '17 00:12 cjprybol

Currently, on 0.7, this is the case for isbits types, but not for other types, which makes it relatively complex to grasp:

julia> similar(missings(Int, 3))
3-element Array{Union{Missings.Missing, Int64},1}:
 missing
 missing
 missing

julia> similar(missings(String, 3))
3-element Array{Union{Missings.Missing, String},1}:
 #undef
 #undef
 #undef

I think it makes more sense to fill all arrays with missing, which could allow us to get rid of missings once we have a short syntax for Union{T, Missing}. After discussing this with @StefanKarpinski, a possible general rule would be to always fill uninitialized arrays with the first singleton type (according to an internal order which doesn't correspond to what the user types).

Waiting for this (which should be discussed in Base), I'd say we should keep DataArrays and CategoricalArrays as they are. But indeed the inconsistency isn't great.

nalimilan avatar Dec 05 '17 14:12 nalimilan

See https://github.com/JuliaLang/julia/issues/24939 about array constructors in Base.

nalimilan avatar Dec 06 '17 09:12 nalimilan