[FR] Add order statistics pdf
Description
When we do CDF regressions, the order statistic pdf updates the target density with the appropriate change of variables. There are 2 possible parameterizations. One involving the k_th order statistic and one involving the quantile. I'm interesting in the quantile version but it's quite easy to swap out since you can swap out k_i with nq_i in the following
from https://arxiv.org/pdf/2008.06423.pdf.
Example
The code is quite simple given a CDF transformed vector U (i.e. U = F(x) where F(*) is a CDF), a vector of the corresponding quantiles, and the total sample size N (where size(U) can be << N):
real orderstatistics_lpdf(vector U, data vector q, int N){
int M = num_elements(q);
real lpdf = lgamma(N + 1) - lgamma(N * q[1]) - lgamma(N * (1 - q[M]) + 1); // normalizing constant
lpdf += (N * q[1] - 1) * log(U[1]);
lpdf += N * (1 - q[M]) * log1m(U[M]);
for (m in 2:M) {
lpdf += -lgamma(N * (q[m] - q[m - 1])); // normalizing constant
lpdf += (N * (q[m] - q[m - 1]) - 1) * log(U[m] - U[m - 1]);
}
return lpdf;
}
I'm assuming q is always data so the derivative is wrt to U. I have it on paper but will add to this post later.
@bgoodri: Could you take a look at this function? Is it what you'd want for this problem.
@spinkney: We can now declare functions where there is a data declaration so I edited your post to indicate that. Also, that second loop can be vectorized to
vector diff_q = q[2:M] - q[1:M-1];
vector diff_U = U[2:M] - U[1:M-1];
lpdf -= sum(lgamma(N * diff_q)); // drop to encode lupdf
lpdf += sum((N * diff_q - 1) .* log(diff_U));
But if q is data, then you can precompute and store not only diff_q but also the normalizing term -sum(lgamma(N * diff_q)) and the pre-factor (N * diff_q - 1) for the second term. That's awkward as a function, but easy in a Stan program.

No objection