Distributions.jl ERROR: MethodError: no method matching JointOrderStatistics(::DiscreteNonParametric{Float64, Float64, Vector{Float64}, Vector{Float64}}, ::Int64)

Hello,

I was blithely trying to construct JointOrderStatistics for a discrete univariate distribution, ignoring the very clear documentation that only continuous univariate distributions are supported, and found that it hadn't been defined. Specifically, I'm trying to create such a distribution for the DiscreteNonParametric type of distribution. Is there any interest in building out an implementation for discrete distributions on sets with an order? I would be willing to give it a try.

Code to replicate error:

using Random

using Distributions

function generate_distribution(rng, n = 100)
    # How spiky?
    Spikiness = Uniform(1, 10)
    spikiness = rand(rng, Spikiness)
    unnormalized_weights = rand(rng, Uniform(), n + 1) .^ spikiness
    xs = collect(range(0, 1, n + 1))
    ps = unnormalized_weights ./ sum(unnormalized_weights)
    DiscreteNonParametric(xs, ps)
end

rng = MersenneTwister(20240218)
n = 3
MyDist = generate_distribution(rng)
μ = mean(MyDist)
θ₂ = var(MyDist)
γ₁ = skewness(MyDist)
κ = kurtosis(MyDist)
S = entropy(MyDist)

SampleDist = JointOrderStatistics(MyDist, n)

Feb 20 '24 15:02 bvancil

I originally didn't implement JointOrderStatistics for discrete distributions because its pmf looked really complicated (from https://doi.org/10.1137/1.9780898719062.ch3):

But on a closer look, $r_s - r_{s-1}$ is just the count of $x_s$ in $x$, so it's pretty simple if ranks is 1:n:

function logpdf(d::JointOrderStatistics{<:DiscreteUnivariateDistribution}, x::AbstractVector{<:Real})
    n = d.n
    ranks = d.ranks
    @assert ranks == 1:n

    # below is equivalent to the following but faster since x should be sorted
    # sum(StatsBase.countmap(x)) do (x_i, c_i)
    #     c_i * logpdf(d, x_i) - loggamma(c_i + 1)
    # end + loggamma(n + 1)

    x_current = first(x)
    count_current = 1
    lp_current = logpdf(d.dist, x_current)
    T = typeof(lp_current)
    lp = zero(lp_current) + loggamma(T(n + 1))
    for i in Iterators.drop(eachindex(x), 1)
        x_i = x[i]
        if x_i == x_current
            count_current += 1
        elseif x_i < x_current || !insupport(d.dist, x_i)
            return oftype(lp, -Inf)
        else
            lp += count_current * lp_current - loggamma(T(count_current + 1))
            x_current = x_i
            count_current = 1
            lp_current = logpdf(d.dist, x_i)
        end
    end
    lp += count_current * lp_current - loggamma(T(count_current + 1))
    return lp
end

I'm not the best way to support arbitrary subsets of ranks. I suspect it would require being able to enumerate all elements in the support of the wrapped distribution between two bounds.

If you'd like to put this together into a PR or see a better way to go about it, I'd be happy to review!

Feb 20 '24 22:02 sethaxen

Thanks for your input. Knowing that you're open to a PR, I'd like to try my hand at it.

Feb 20 '24 22:02 bvancil