ERROR: MethodError: no method matching JointOrderStatistics(::DiscreteNonParametric{Float64, Float64, Vector{Float64}, Vector{Float64}}, ::Int64)
Hello,
I was blithely trying to construct JointOrderStatistics for a discrete univariate distribution, ignoring the very clear documentation that only continuous univariate distributions are supported, and found that it hadn't been defined. Specifically, I'm trying to create such a distribution for the DiscreteNonParametric type of distribution. Is there any interest in building out an implementation for discrete distributions on sets with an order? I would be willing to give it a try.
Code to replicate error:
using Random
using Distributions
function generate_distribution(rng, n = 100)
# How spiky?
Spikiness = Uniform(1, 10)
spikiness = rand(rng, Spikiness)
unnormalized_weights = rand(rng, Uniform(), n + 1) .^ spikiness
xs = collect(range(0, 1, n + 1))
ps = unnormalized_weights ./ sum(unnormalized_weights)
DiscreteNonParametric(xs, ps)
end
rng = MersenneTwister(20240218)
n = 3
MyDist = generate_distribution(rng)
μ = mean(MyDist)
θ₂ = var(MyDist)
γ₁ = skewness(MyDist)
κ = kurtosis(MyDist)
S = entropy(MyDist)
SampleDist = JointOrderStatistics(MyDist, n)
I originally didn't implement JointOrderStatistics for discrete distributions because its pmf looked really complicated (from https://doi.org/10.1137/1.9780898719062.ch3):
But on a closer look, $r_s - r_{s-1}$ is just the count of $x_s$ in $x$, so it's pretty simple if ranks is 1:n:
function logpdf(d::JointOrderStatistics{<:DiscreteUnivariateDistribution}, x::AbstractVector{<:Real})
n = d.n
ranks = d.ranks
@assert ranks == 1:n
# below is equivalent to the following but faster since x should be sorted
# sum(StatsBase.countmap(x)) do (x_i, c_i)
# c_i * logpdf(d, x_i) - loggamma(c_i + 1)
# end + loggamma(n + 1)
x_current = first(x)
count_current = 1
lp_current = logpdf(d.dist, x_current)
T = typeof(lp_current)
lp = zero(lp_current) + loggamma(T(n + 1))
for i in Iterators.drop(eachindex(x), 1)
x_i = x[i]
if x_i == x_current
count_current += 1
elseif x_i < x_current || !insupport(d.dist, x_i)
return oftype(lp, -Inf)
else
lp += count_current * lp_current - loggamma(T(count_current + 1))
x_current = x_i
count_current = 1
lp_current = logpdf(d.dist, x_i)
end
end
lp += count_current * lp_current - loggamma(T(count_current + 1))
return lp
end
I'm not the best way to support arbitrary subsets of ranks. I suspect it would require being able to enumerate all elements in the support of the wrapped distribution between two bounds.
If you'd like to put this together into a PR or see a better way to go about it, I'd be happy to review!
Thanks for your input. Knowing that you're open to a PR, I'd like to try my hand at it.