math icon indicating copy to clipboard operation
math copied to clipboard

[FR] Add order statistics pdf

Open spinkney opened this issue 4 years ago • 3 comments

Description

When we do CDF regressions, the order statistic pdf updates the target density with the appropriate change of variables. There are 2 possible parameterizations. One involving the k_th order statistic and one involving the quantile. I'm interesting in the quantile version but it's quite easy to swap out since you can swap out k_i with nq_i in the following

image from https://arxiv.org/pdf/2008.06423.pdf.

Example

The code is quite simple given a CDF transformed vector U (i.e. U = F(x) where F(*) is a CDF), a vector of the corresponding quantiles, and the total sample size N (where size(U) can be << N):

 real orderstatistics_lpdf(vector U, data vector q, int N){
        int M = num_elements(q);
        real lpdf = lgamma(N + 1) - lgamma(N * q[1]) - lgamma(N * (1 - q[M]) + 1); // normalizing constant
        lpdf += (N * q[1] - 1) * log(U[1]);
        lpdf += N * (1 - q[M]) * log1m(U[M]);
        
        for (m in 2:M) {
            lpdf += -lgamma(N * (q[m] - q[m - 1])); // normalizing constant
            lpdf += (N * (q[m] - q[m - 1]) - 1) * log(U[m] - U[m - 1]); 
        }
        return lpdf;
    }

I'm assuming q is always data so the derivative is wrt to U. I have it on paper but will add to this post later.

spinkney avatar Oct 19 '21 10:10 spinkney

@bgoodri: Could you take a look at this function? Is it what you'd want for this problem.

@spinkney: We can now declare functions where there is a data declaration so I edited your post to indicate that. Also, that second loop can be vectorized to

vector diff_q = q[2:M] - q[1:M-1];
vector diff_U = U[2:M] - U[1:M-1];
lpdf -= sum(lgamma(N * diff_q));  // drop to encode lupdf
lpdf += sum((N * diff_q - 1) .* log(diff_U));

But if q is data, then you can precompute and store not only diff_q but also the normalizing term -sum(lgamma(N * diff_q)) and the pre-factor (N * diff_q - 1) for the second term. That's awkward as a function, but easy in a Stan program.

bob-carpenter avatar Oct 19 '21 16:10 bob-carpenter

image

spinkney avatar Oct 21 '21 12:10 spinkney

No objection

bgoodri avatar Nov 04 '21 17:11 bgoodri