Statistics.jl icon indicating copy to clipboard operation
Statistics.jl copied to clipboard

median with missing data and reduction over a given dimension

Open Alexander-Barth opened this issue 7 years ago • 1 comments

The following call to median and sum work:

using Statistics
sum([1. 2. missing; 2. 3. 4.]; dims = 1)
# 1×3 Array{Union{Missing, Float64},2}:
# 3.0  5.0  missing

and

using Statistics
median([1., 2., missing]) # returns missing

However a call to median with a missing value and a reduction over a specified dimension fails:

julia> median([1. 2. missing; 2. 3. 4.]; dims = 1)
ERROR: MethodError: Cannot `convert` an object of type Missing to an object of type Float64
Closest candidates are:
  convert(::Type{T<:Number}, ::T<:Number) where T<:Number at number.jl:6
  convert(::Type{T<:Number}, ::Number) where T<:Number at number.jl:7
  convert(::Type{T<:Number}, ::Base.TwicePrecision) where T<:Number at twiceprecision.jl:250
  ...
Stacktrace:
 [1] setindex!(::Array{Float64,2}, ::Missing, ::Int64) at ./array.jl:767
 [2] setindex! at ./subarray.jl:293 [inlined]
 [3] macro expansion at ./broadcast.jl:843 [inlined]
 [4] macro expansion at ./simdloop.jl:73 [inlined]
 [5] copyto! at ./broadcast.jl:842 [inlined]
 [6] copyto! at ./broadcast.jl:797 [inlined]
 [7] materialize!(::SubArray{Float64,1,Array{Float64,2},Tuple{Base.OneTo{Int64},Int64},true}, ::Base.Broadcast.Broadcasted{Base.Broadcast.Style{Tuple},Nothing,typeof(identity),Tuple{Tuple{Missing}}}) at ./broadcast.jl:756
 [8] concatenate_setindex!(::Array{Float64,2}, ::Missing, ::Base.OneTo{Int64}, ::Vararg{Any,N} where N) at ./abstractarray.jl:2005
 [9] inner_mapslices!(::Bool, ::Base.Iterators.Drop{CartesianIndices{1,Tuple{Base.OneTo{Int64}}}}, ::Int64, ::Array{Any,1}, ::Array{Int64,1}, ::Array{Any,1}, ::Array{Union{Missing, Float64},1}, ::Array{Union{Missing, Float64},2}, ::typeof(median!), ::Array{Float64,2}) at ./abstractarray.jl:1986
 [10] #mapslices#109(::Int64, ::Function, ::typeof(median!), ::Array{Union{Missing, Float64},2}) at ./abstractarray.jl:1976
 [11] #mapslices at ./none:0 [inlined]
 [12] _median at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.1/Statistics/src/Statistics.jl:757 [inlined]
 [13] #median#44 at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.1/Statistics/src/Statistics.jl:755 [inlined]
 [14] (::getfield(Statistics, Symbol("#kw##median")))(::NamedTuple{(:dims,),Tuple{Int64}}, ::typeof(median), ::Array{Union{Missing, Float64},2}) at ./none:0
 [15] top-level scope at none:0

I see this behaviour in julia 1.0.3 and julia 1.1.0. I think the behaviour should be similar to sum and not produce an error. Is this a known issue?

Alexander-Barth avatar Apr 16 '19 06:04 Alexander-Barth

It's due to a limitation of mapslices. See https://github.com/JuliaLang/julia/pull/31217.

We should maybe switch to map(median, eachslice(X; dims = ...)) as suggested by @mbauman in that PR.

nalimilan avatar Apr 17 '19 15:04 nalimilan