loo
loo copied to clipboard
extend gpdfit to return marginal posterior of k #186
Addresses #186: @jgabry @avehtari
Extended gpdfit()
to return the marginal posterior density of k in addition to k-hat and sigma-hat. We can also create a separate function but this seemed like a better approach to me.
- Updated the profile log-likelihood calculation by using an outer product (
%o%
) instead ofvapply
, which made it more efficient.
# switched to
k <- matrixStats::rowMeans2(log1p(-theta %o% x))
N * (log(-theta / k) - k - 1)
# instead of
a <- -theta
k <- vapply(-a, FUN = function(a_i) mean(log1p(a_i * x)), FUN.VALUE = numeric(1))
N * (log(-a / k) - k - 1)
- Dropped the
lx()
function and moved its updated content togpdfit()
to avoid repeated calculations because the code example in the issue thread actually computes thek
values twice:
# first (within the `lx()` function)
l_theta <- N * lx(theta, x) # profile log-lik
# second
ks <- numeric(length=M)
for (i in 1:M) {
ks[i] <- mean.default(log1p(-theta[i] * x))
}
- Following the notation of Zhang, J., and Stephens, M. A. (2009), suggested new names for the
gpdfit()
output components:
# switched to
nlist(k_hat, sigma_hat, k, w_theta)
# instead of
nlist(k, sigma)
This name change shouldn't really cause backward compatibility issues but it's not that important anyway. I will use whatever you believe is more appropriate.
@avehtari - I can update the function with normalization term estimation using the trapezoidal rule once you decide what to return. Below are some options:
# normalization term estimation with the trapezoidal rule
c <- 2 / sum((w_theta[-M] + w_theta[-1]) * (k[-M] - k[-1]))
# normalized densities
d_theta <- c * w_theta
# some return options:
nlist(k_hat, sigma_hat, k, w_theta) # normalized quadrature weights
nlist(k_hat, sigma_hat, k, d_theta) # normalized densities
nlist(k_hat, sigma_hat, k, w_theta, c) # normalized quadrature weights + density normalization term
nlist(k_hat, sigma_hat, k, w_theta, d_theta) # normalized quadrature weights + normalized densities
I noticed a small side effect of this recent commit. That is, when the estimation doesn't work, k-hat and sigma-hat are not necessarily Inf
and NaN
, respectively:
set.seed(147)
x <- exp(rnorm(10))
# CRAN version (without the commit)
gpdfit(x, sort_x = FALSE)
$k
[1] Inf
$sigma
[1] NaN
# GitHub version on master (with the commit)
$k
[1] NA
$sigma
[1] NA
Should we avoid NA
s and return always Inf
and NaN
here? I can also suppress the NaNs produced
warnings that occur in these cases.
Should we avoid NAs and return always Inf and NaN here?
I would prefer khat=Inf when the failure is due to extremely long tail. At least in the cases I've seen Inf appearing, the tail was very long.
Thanks for the PR!
Should we avoid
NA
s and return alwaysInf
andNaN
here?
Yeah I think Inf for k and and NaN for sigma would be better, thanks!
Based on the discussion in the issue thread, we still need to decide whether we want to return normalized quadrature weights or normalized densities.
I haven't had a chance to think much about this. I'm ok with whatever @avehtari decides.
Thinking for a moment, I think normalized densities would make more sense, and if someone would like to use normalized quadrature weights it's easier to obtain them from normalized densities than vice versa.
Quick question -- is the gpd_fit
posterior actually a good one? The estimator is good, but because the prior is derived from the data, the posterior will be overconfident. This doesn't matter much if you just need k-hat, but I think you'd have to test how much the empirical prior affects the overconfidence.
I have tested. When using sample sizes that are typical gpd_fit use and when tail thicknesses are in the interesting region (<0.7), gpd_fit approximation is very good with no practical difference when compared to the marginals of the full posterior computed with Stan. I guess I should add this comment to the PSIS paper revision.
Autumn has been very busy, but I hope to be able to go through some PRs and issues in a a few weeks.
Thanks to a question by @sethaxen, I realized that what is above named as densities are not yet densities. Originally, the algorithm computes quadrature weights and for those most natural scale would be normalized weights. To compute densities, it is not enough to compute the normalization term, as each quadrature point presents a different volume of the parameter space. I'll prepare a note with illustration and code. We can then discuss what we should return and how to document clearly what we are returning.
Here is an illustration
The yellow contours present the likelihood, the red dashed contours present the posterior, blue line is the profile line, the blue dots present the quadrature points. w_theta
are normalized likelihood values at the qudrature points and the effect of the prior comes from non-equidistant spacing of the quadrature points. To plot the marginal posterior of k, we need to evaluate the posterior density at the quadrature points (or in any other evaluation points).
The following code change returns the quadrature weights k_w
that are specific to the quadrature points k
, and posterior densities k_d
evaluated at the quadrature points k
.
# quadrature weights for k are same as for theta
k_w <- w_theta
# quadrature weights are just the normalized likelihoods
# we get the unnormalized posterior by multiplying these by the prior
k_d <- k_w * dgpd(-theta, mu=-1/x[N], sigma=1/prior/xstar, xi=0.5)
# normalize using the trapezoidal rule
Z <- sum((k_d[-M] + k_d[-1]) * (k[-M] - k[-1])) / 2
k_d <- k_d / Z
nlist(k_hat, sigma_hat, k, k_w, k_d)
Here is the exact marginal posterior (red dashed) and the profile posterior (blue) corresponding to the above figure
The following plot shows comparisons with one data realization each using M=20, 100, 1000 and k=0.3, 0.5, 0.7 ,0.9.
The exact marginal posterior is wider than the profile posterior, as expected, but the profile posterior is quite close and thus useful for presenting the uncertainty. In my experiments I used extraDistr::dgpd, but to avoid the dependency we should implement that in loo. Computing the prior added maybe 10% or less computation time, so computing the marginal approximation could be moved also to another function.
I have tested. When using sample sizes that are typical gpd_fit use and when tail thicknesses are in the interesting region (<0.7), gpd_fit approximation is very good with no practical difference when compared to the marginals of the full posterior computed with Stan. I guess I should add this comment to the PSIS paper revision.
Autumn has been very busy, but I hope to be able to go through some PRs and issues in a a few weeks.
I see; does that hold even though the samples aren't IID? That would also tend to make the posterior too narrow. Perhaps "correcting" the log-likelihood by multiplying it with r_eff
would fix that problem well enough in practice, though I'm not sure such an approach would be fully justified.
I do also wonder how Zhang's method compares to Laplace approximation for finding the posterior in terms of performance.
I see; does that hold even though the samples aren't IID? That would also tend to make the posterior too narrow.
True. I don't think this is an issue as the plan at the moment is just to provide some indication of the uncertainty. Of course, if you plan to use it for something which needs more accuracy, then follow the usual extreme value analysis procedure and thin the chains to have approximately independent draws. For Pareto-k diagnostic and PSIS, having too narrow posterior doesn't harm if there is no big change in the posterior mean, which seems to be true based on the experiments with dependencies typical for MCMC from Stan. If you are worried, then just thin.
I do also wonder how Zhang's method compares to Laplace approximation for finding the posterior in terms of performance.
Not a huge difference, but it doesn't matter as the posterior is much more skewed than normal and the posterior mode is then clearly different from the posterior mean.
- tried to update
gpdfit()
to reflect the improvements @avehtari mentioned above - implemented the full suite of
dgpd/pgpd/qgpd/rgpd()
functions - cleaned up and condensed some internal helpers
@jgabry; the new functions pass my local checks and the existing tests but it'd be great if you could also give them a try, perhaps with some fresh test cases.
Sorry, this has been left hanging for such a long time. After considering, I think we shouldn't change the existing output variable names (ie k -> k_hat, sigma -> sigma_hat), as it is possible the function is already used (and at least I'm using it in different simulations), and that would break those.
I'm also thinking that maybe we should have a separate function for the marginal posterior. It is almost just a by product, but still it is adding a bit of computation and memory usage, and maybe it would be better to have the streamlined one for PSIS-LOO, and then have separate function for the cases when someone wants to investigate more. It would be then possible to add also computationally more intensive, but more accurate marginal posterior computation, too.
Adding a comment that this will very likely go to posterior package, and there is no need to merge this to loo