StatsBase.jl icon indicating copy to clipboard operation
StatsBase.jl copied to clipboard

Ambiguity detection

Open timholy opened this issue 8 years ago • 8 comments

StatsBase has a number of ambiguities on 0.6:

julia> using StatsBase

julia> using Base.Test

julia> detect_ambiguities(StatsBase, Base, Core)
Skipping StatsBase.hist
Skipping StatsBase.wmean!
Skipping Base.<|
5-element Array{Tuple{Method,Method},1}:
 ((::Base.#kw##stdm)(::Array{Any,1}, ::Base.#stdm, v::AbstractArray{T,N} where N where T<:Real, m::AbstractArray{T,N} where N where T<:Real, dim::Int64) in StatsBase, (::Base.#kw##stdm)(::Array{Any,1}, ::Base.#stdm, v::AbstractArray{T,N} where N where T<:Real, w::StatsBase.AbstractWeights, m::Real) in StatsBase)                                                                 
 (stdm(v::AbstractArray{T,N} where N where T<:Real, w::StatsBase.AbstractWeights, m::Real) in StatsBase at /home/tim/.julia/v0.6/StatsBase/src/moments.jl:129, stdm(v::AbstractArray{T,N} where N where T<:Real, m::AbstractArray{T,N} where N where T<:Real, dim::Int64) in StatsBase at /home/tim/.julia/v0.6/StatsBase/src/moments.jl:153)                                             
 ((::Base.#kw##stdm)(::Array{Any,1}, ::Base.#stdm, v::AbstractArray{T,N} where N where T<:Real, w::StatsBase.AbstractWeights, m::Real) in StatsBase, (::Base.#kw##stdm)(::Array{Any,1}, ::Base.#stdm, v::AbstractArray{T,N} where N where T<:Real, m::AbstractArray{T,N} where N where T<:Real, dim::Int64) in StatsBase)                                                                 
 (varm(A::AbstractArray{T,N} where N where T<:Real, w::StatsBase.AbstractWeights, M::AbstractArray{T,N} where N where T<:Real, dim::Int64) in StatsBase at /home/tim/.julia/v0.6/StatsBase/src/moments.jl:87, varm(A::AbstractArray{T,N} where N where T<:Real, M::AbstractArray{T,N} where N where T<:Real, wv::StatsBase.AbstractWeights, dim::Int64) in StatsBase at deprecated.jl:50) 
 (stdm(v::AbstractArray{T,N} where N where T<:Real, w::StatsBase.AbstractWeights, m::AbstractArray{T,N} where N where T<:Real, dim::Int64) in StatsBase at /home/tim/.julia/v0.6/StatsBase/src/moments.jl:156, stdm(v::AbstractArray{T,N} where N where T<:Real, m::AbstractArray{T,N} where N where T<:Real, wv::StatsBase.AbstractWeights, dim::Int64) in StatsBase at deprecated.jl:50)

In several of my packages I make ambiguity detection part of the tests. Sometimes you want to "subtract" ambiguities caused by dependent packages. Here's an example. (Printing the ambiguities increases the interpretability of failures on Travis.)

timholy avatar May 23 '17 13:05 timholy

I'm still a bit confused by this. Wasn't the point of making method ambiguities runtime errors that you could just keep those that are unlikely to happen in practice? If the standard becomes to test this, wouldn't we be better off with the old compile time errors?

andreasnoack avatar May 23 '17 13:05 andreasnoack

I had started on adding this to the tests in #258.

In this case these should be easily fixable since a couple are for deprecations and one is because of unnecessary type piracy.

ararslan avatar May 23 '17 18:05 ararslan

You can live with them if they aren't likely to be triggered and/or are not easily fixable. One of my favorite examples (long gone, of course) was a method that was ambiguous of you passed it an Image of DateTime; not something anyone is likely to try anytime soon. So I would agree, one shouldn't care about this.

In practice, however, most ambiguities are fixable. For example, I think these are internally generated within StatsBase. Here's an example of triggering one of them:

julia> v = [1,2,3]
3-element Array{Int64,1}:
 1
 2
 3

julia> w = Weights([1,1,1])
3-element StatsBase.Weights{Int64,Int64,Array{Int64,1}}:
 1
 1
 1

julia> stdm(v, w, 2)
ERROR: MethodError: stdm(::Array{Int64,1}, ::StatsBase.Weights{Int64,Int64,Array{Int64,1}}, ::Int64) is ambiguous. Candidates:
  stdm(v::AbstractArray{T,N} where N where T<:Real, m::AbstractArray{T,N} where N where T<:Real, dim::Int64) in StatsBase at /home/tim/.julia/v0.6/StatsBase/src/moments.jl:153
  stdm(v::AbstractArray{T,N} where N where T<:Real, w::StatsBase.AbstractWeights, m::Real) in StatsBase at /home/tim/.julia/v0.6/StatsBase/src/moments.jl:129
Possible fix, define
  stdm(::AbstractArray{T,N} where N where T<:Real, ::StatsBase.AbstractWeights, ::Int64)

I think those are not entirely unreasonable operations for a user to try.

timholy avatar May 23 '17 23:05 timholy

I'm fine with fixing ambiguities if they cause real issues and these look like something we should consider fixing. My objection was mainly to test systematically for all ambiguities. In that case, the compile time errors seem more appropriate. Ideally, the normal tests would reveal any problematic ambiguities by exercising reasonable code paths.

andreasnoack avatar May 24 '17 02:05 andreasnoack

@timholy That particular ambiguity was discussed in #248 after it was merged. The solution should be to use RealArray, RealVector and RealMatrix appropriately, but one of those methods also needs to be moved to Base since Base.stdm(v::RealArray, m::RealArray, dim::Int) is type piracy.

rofinn avatar May 24 '17 02:05 rofinn

Almost six years later, with StatsBase vö.33.21, there are still ambiguities in the module:

julia> using StatsBase

julia> using Test

julia> Test.detect_ambiguities(StatsBase, Base, Core)
5-element Vector{Tuple{Method, Method}}:
 (==(y::Real, x::Union{StatsBase.PValue, StatsBase.TestStat}) in StatsBase at /home/david/.julia/packages/StatsBase/XgjIN/src/statmodels.jl:91, ==(x::AbstractIrrational, y::Real) in Base at irrationals.jl:88)
 (==(x::Union{StatsBase.PValue, StatsBase.TestStat}, y::Real) in StatsBase at /home/david/.julia/packages/StatsBase/XgjIN/src/statmodels.jl:90, ==(x::Real, y::AbstractIrrational) in Base at irrationals.jl:89)
 (StatsBase.TestStat(v) in StatsBase at /home/david/.julia/packages/StatsBase/XgjIN/src/statmodels.jl:80, (::Type{T})(x::AbstractChar) where T<:Union{AbstractChar, Number} in Base at char.jl:50)
 (StatsBase.TestStat(v) in StatsBase at /home/david/.julia/packages/StatsBase/XgjIN/src/statmodels.jl:80, (::Type{T})(x::Base.TwicePrecision) where T<:Number in Base at twiceprecision.jl:266)
 (StatsBase.TestStat(v) in StatsBase at /home/david/.julia/packages/StatsBase/XgjIN/src/statmodels.jl:80, (::Type{T})(z::Complex) where T<:Real in Base at complex.jl:44)

The problem is that these also interfere with the ambiguity tests of my own module, which happen to be using StatsBase. Instead of any potential ambiguities in my own module, I see these issues (as well).

dhanak avatar Mar 27 '23 09:03 dhanak

Almost six years later, with StatsBase vö.33.21, there are still ambiguities in the module:

Hmm, the good news is it's just 5 of them, and they look very easy to fix. Would you be willing to make a PR fixing those, and adding an ambiguities test to make sure any new ones don't creep in?

ParadaCarleton avatar Aug 23 '23 00:08 ParadaCarleton

There's already a PR: https://github.com/JuliaStats/StatsBase.jl/pull/866

devmotion avatar Aug 23 '23 09:08 devmotion