StatsBase.jl
StatsBase.jl copied to clipboard
Throw an error for non-integer `FrequencyWeights`
Currently, it is possible to construct a set of FrequencyWeights with non-integer weights:
julia> FrequencyWeights([.5, .1])
2-element FrequencyWeights{Float64, Float64, Vector{Float64}}:
0.5
0.1
However, the definition of frequency weights only allows for integer-valued weights, and currently-implemented methods either assume that frequency weights are integer-valued and sum up to the sample size, or else check themselves that the weights are integer-valued. This creates a lot of duplicate checks and also increases the risk of an error if someone writes a method forgetting to check whether or not frequency weights are all integer-valued. I propose we move the error upstream so that attempting to construct FrequencyWeights with non-integer values throws an error, rather than throwing an error whenever a function is called on invalid weights.
AFAIK all definitions for frequency weights generalize perfectly to non-integer weights. The only place where we check that weights are integers is in quantile, right? See https://github.com/JuliaStats/StatsBase.jl/pull/436.
I guess we could add a check, but throwing an error would be breaking so it has to wait the next breaking release. We could add a check argument though, keeping it to false for now.
AFAIK all definitions for frequency weights generalize perfectly to non-integer weights. The only place where we check that weights are integers is in
quantile, right? See #436.
I think that's the only place where we check that, but we implicitly assume the frequency weights are counts (rather than proportions): all code for FrequencyWeights assumes that sum(weights) is equal to the sample size at the moment.
(To be clear, my proposal is to check that all the weights are integers, not that all of them are Int. This allows for things like representing large integers as floats.)