Statistics.jl icon indicating copy to clipboard operation
Statistics.jl copied to clipboard

Mean of an array with missing values does not work if the dims argument is provided

Open oxinabox opened this issue 6 years ago • 6 comments

julia> using Statistics

julia> x = [(i==3 && j==3) ? missing : i*j for i in 1:3, j in 1:4]
3×4 Array{Union{Missing, Int64},2}:
 1  2  3          4
 2  4  6          8
 3  6   missing  12

julia> mean(x)
missing

julia> mean(x; dims=1)
ERROR: MethodError: Cannot `convert` an object of type Missing to an object of type Float64
    # SNIP 
julia> mean(x; dims=2)
ERROR: MethodError: Cannot `convert` an object of type Missing to an object of type Float64
    # SNIP

Expected is:

julia> mean(x; dims=1)
1×4 Array{Union{Missing,Float64},2}:
 2.0  4.0  missing  8.0

julia> mean(x; dims=2)
3×1 Array{Union{Missing,Float64},2}:
 2.5
 5.0
 missing

(credit to @nickrobinson251 who found this)

oxinabox avatar Mar 19 '19 19:03 oxinabox

It appears to be because of this: https://github.com/JuliaLang/julia/blob/master/stdlib/Statistics/src/Statistics.jl#L134

julia> x = [(i == j == 3) ? missing : i * j for i in 1:3, j in 1:4]
3×4 Array{Union{Missing, Int64},2}:
 1  2  3          4
 2  4  6          8
 3  6   missing  12

julia> Base.reducedim_init(t->t/2, +, x, 1)
1×4 Array{Float64,2}:
 0.0  0.0  0.0  0.0

That gets passed to mean!, which tries to store the result in that array, but it can't because of the missing.

ararslan avatar Mar 19 '19 19:03 ararslan

Going further, we get that result from reducedim_init because of this line: https://github.com/JuliaLang/julia/blob/master/base/reducedim.jl#L117, since zero(Union{Missing,Int}) === 0.

ararslan avatar Mar 19 '19 19:03 ararslan

sum handles this just fine, so I think we should change mean to go though more of the sum machinery.

ararslan avatar Mar 19 '19 19:03 ararslan

julia> using Statistics

julia> a = [1 2;
            missing 3]
2×2 Array{Union{Missing, Int64},2}:
 1         2
  missing  3

julia> sum(skipmissing(a), dims=1)
ERROR: MethodError: no method matching sum(::Base.SkipMissing{Array{Union{Missing, Int64},2}}; dims=1)
Closest candidates are:
  sum(::Any) at reduce.jl:503 got unsupported keyword argument "dims"
  sum(::Any, ::AbstractArray; dims) at reducedim.jl:653
  sum(::Any, ::Any) at reduce.jl:486 got unsupported keyword argument "dims"
  ...
Stacktrace:
 [1] top-level scope at REPL[35]:1

sum also doesn't work now?

Moelf avatar Jul 17 '20 01:07 Moelf

@Moelf sum(skipmissing(...), dims=...) has never worked. https://github.com/JuliaLang/julia/pull/28027/ implements it but it's not been merged (yet?).

This failure is also interesting:

julia> mean([missing 1; 3 4]; dims=1)
ERROR: InexactError: Int64(2.5)

It happens because we use the type of first(x)/1 to choose the eltype of the result, which in this case is Missing.

sum handles this just fine, so I think we should change mean to go though more of the sum machinery.

Actually, no, sum only handles this correctly in cases where Base.add_sum returns the same type as its inputs. But it fails when the type differs (which is precisely what mean does for integer inputs):

julia> sum([missing true; false false], dims=1)
ERROR: MethodError: Cannot `convert` an object of type Missing to an object of type Int64

The only way to fix this is to redesign the reducedim code, as _reducedim_init is just a hack when the type isn't concrete . AFAICT the only 100% correct solution is to do as with mapand broadcast: allocate an array using the type of the first element, and widen its eltype if necessary.

nalimilan avatar Aug 06 '20 13:08 nalimilan

Any news/updates/new ideas on this?

jo-fleck avatar Aug 17 '22 21:08 jo-fleck