numerical instability in beta_binomial_lpmf at large shape parameters
hi all,
(sorry if this is the wrong place for this)
I was fitting a mixture model that involves multiple calls to beta_binomial_lpmf. There, I put a linear model on log(concentration) before converting to shapes to pass to beta_binomial_lpmf). Fitting the model many hundreds of times over replicate datasets, I noticed that chains would occasionally get stuck at super high values for log(concentration) and log target density. I think this is due to numerical instability in how beta_binomial_lpmf is computed, probably when evaluating the log Beta function.
As a point of contrast, if you look eg the VGAM implementation of the beta-binomial log-pmf in R, VGAM::dbetabinom.ab has the argument:
| Inf.shape | Numeric. A large value such that, if shape1 or shape2 exceeds this, then special measures are taken, e.g., calling dbinom. Also, if shape1 or shape2 is less than its reciprocal, then special measures are also taken. This feature/approximation is needed to avoid numerical problem with catastrophic cancellation of multiple lbeta calls. |
|---|
Here's a quick demo showing the behavior. If I fit the Stan model:
data { int<lower=0> count; int<lower=count> total; real<lower=0> sd_logconc; }
parameters { real log_concentration; }
model { log_concentration ~ normal(0, sd_logconc); }
generated quantities { // Convert to alpha, beta real shape = exp(log_concentration) / 2;
// Compare binomial PMF at p = 0.5 vs. beta-binomial real log_binomial_05 = binomial_lpmf(count | total, 0.5); real log_beta_binomial = beta_binomial_lpmf(count | total, shape, shape); real diff_lprob = log_beta_binomial - log_binomial_05; }
with whatever count and total, and with sd_logconc at a small value (eg count = 57, total = 117, sd_logconc = 5), then diff_lprob converges onto 0 when log_concentration takes on high values, as expected:
but if you set sd_logconc = 50, its behavior grows erratic before settling at a high positive value:
(in this example, it starts showing this behavior around log_concentration > 33).
The linear model I had on log(concentration) should not have put much probability at all on parameters that could produce such high log_concentrations (which would induce a very informative prior on concentration), so I'm guessing it got there during a loosey-goosey adaptation phase (if the chain post-adaptation is initialized to wherever it ends up at the end of adaptation).
here's a code snippet for the graph: https://gist.github.com/NikVetr/2d99b235a66423429776762ab73c9388
Does this work or will we need to do what VGAM does?
functions {
real beta_binomial_stable_lpmf(int y, int n, real alpha, real beta) {
if (y < 0 || y > n) reject("beta_binomial_stable_lpmf: y out of range");
if (!(alpha > 0 && beta > 0)) reject("beta_binomial_stable_lpmf: alpha,beta must be > 0");
real lc = lchoose(n, y);
real d1 = log_rising_factorial(alpha, y);
real d2 = log_rising_factorial(beta, n - y);
real d3 = log_rising_factorial(alpha + beta, n);
return lc + d1 + d2 - d3;
}
}
What I had before doesn't fix it but this does. We can stay on the log scale or we can internally log alpha and beta and calculate the log of the rising factorial with log inputs.
@andrjohns would adding the log_rising_factorial_log function be something you're interested in doing? Then we can update beta binomal lpmf to use this.
functions {
// Rising Pochhammer in log-scale with log-parameter:
real log_rising_factorial_log(real log_x, int m) {
if (m <= 0) return 0.0;
real acc = log_x; // base case, ie j == 0
for (j in 1:m - 1) {
acc += log_sum_exp(log_x, log(j));
}
return acc;
}
real beta_binomial_log_lpmf(int y, int n, real log_alpha, real log_beta) {
// lchoose is already on the log scale and exact for integers
real lp = lchoose(n, y);
lp += log_rising_factorial_log(log_alpha, y); // lgamma(α+y) - lgamma(α)
lp += log_rising_factorial_log(log_beta, n - y); // lgamma(β+n−y) - lgamma(β)
real log_alpha_plus_beta = log_sum_exp(log_alpha, log_beta);
lp -= log_rising_factorial_log(log_alpha_plus_beta, n);
return lp;
}
real beta_binomial_stable_lpmf(int y, int n, real alpha, real beta) {
// lchoose is already on the log scale and exact for integers
real lp = lchoose(n, y);
real log_alpha = log(alpha);
real log_beta = log(beta);
lp += beta_binomial_log_lpmf(y | n, log_alpha, log_beta);
return lp;
}
}
@NikVetr can you test and let me know if this works for your model?
What I had before doesn't fix it but this does. We can stay on the log scale or we can internally log alpha and beta and calculate the log of the rising factorial with log inputs.
@andrjohns would adding the log_rising_factorial_log function be something you're interested in doing? Then we can update beta binomal lpmf to use this.
functions { // Rising Pochhammer in log-scale with log-parameter: real log_rising_factorial_log(real log_x, int m) { if (m <= 0) return 0.0; real acc = log_x; // base case, ie j == 0 for (j in 1:m - 1) { acc += log_sum_exp(log_x, log(j)); } return acc; } real beta_binomial_log_lpmf(int y, int n, real log_alpha, real log_beta) { // lchoose is already on the log scale and exact for integers real lp = lchoose(n, y); lp += log_rising_factorial_log(log_alpha, y); // lgamma(α+y) - lgamma(α) lp += log_rising_factorial_log(log_beta, n - y); // lgamma(β+n−y) - lgamma(β) real log_alpha_plus_beta = log_sum_exp(log_alpha, log_beta); lp -= log_rising_factorial_log(log_alpha_plus_beta, n); return lp; } real beta_binomial_stable_lpmf(int y, int n, real alpha, real beta) { // lchoose is already on the log scale and exact for integers real lp = lchoose(n, y); real log_alpha = log(alpha); real log_beta = log(beta); lp += beta_binomial_log_lpmf(y | n, log_alpha, log_beta); return lp; } }
Interesting! I'm on flights today and catching up on a Stan-backlog, will add to the list!
@spinkney hmm I haven't tried it with whatever model I was actually working on at the time (not sure I remember which one it would have been 😅), but playing around in the more general setting I can confirm that the first function has the same issue and the second function is works! with one modification, you add an extra lchoose(n, y) in beta_binomial_stable_lpmf, which you'd already added in beta_binomial_log_lpmf so it gets double counted
(since it's a constant that depends only on the data and not on any parameters, it wouldn't affect things in the direct sampling context, but I think would affect eg model comparison / mixtures)
I also realize another issue -- since the beta-binomial lpmf approaches the binomial lpmf at sufficiently large log-concentrations, if users put uniform priors on log-concentration (proper or improper), you'd eventually run into flat gradient issues, at least during adaptation, right? I'd expect most folks would use something weakly informative eg a half-normal, edit: testing it out in simulation with super uninformative ie n=5, I do get divergences, but they mostly go away at n=10 with adapt_delta = 0.99
Yes, thanks for spotting that double counting, I forget to remove it when updating.
For the divergences with large log concentration parameters and uniform priors, do you find this with the updated function using log rising factorial with log inputs?
@spinkney is the goal to keep on the log-scale for overflow or underflow? Would you roughly know what values the problem start at?
Overflow, the issue starts around 28 for log concentration.