bridgesampling icon indicating copy to clipboard operation
bridgesampling copied to clipboard

Bridgesampling Error

Open lsolomyak opened this issue 5 years ago • 6 comments

When I run bridgesampling on a stan fit I get the following error

`Iteration: 1
Error in out[!from@positive] <- -out[!from@positive] : 
  NAs are not allowed in subscripted assignments
In addition: Warning messages:
1: 16000 of the 16000 log_prob() evaluations on the posterior draws produced -Inf/Inf. 
2: 16000 of the 16000 log_prob() evaluations on the proposal draws produced -Inf/Inf.`

in my fit object there is one variable that has NA values in it but it's one that is not of interest to me. WIll this prevent bridgesampling from running requiring me to re run stan fit? Or Are there any workarounds to help bridgesampling still run

Thank you LS

lsolomyak avatar Dec 16 '19 13:12 lsolomyak

Hey, I'm facing the same issue... However I don't find any NAs in my fit object. Has there ever been a solution?

ktrutmann avatar Mar 24 '21 12:03 ktrutmann

Okay, I don't know whether this will help you @lsolomyak (also considering you asked over a year ago), but in my case the problem was related to this bug in Stan: https://github.com/stan-dev/rstan/issues/649

In short, when you write samples to .csv files and tell Stan not to save the warmup samples, it will still write n_samples + n_warmup rows in the sample files. The result is that the first n_samples rows are filled with your samples and the trailing n_warmup rows are just filled with a bunch of zeros.

I have written a function that takes the stanfit object and cuts away the superfluous rows. You can find it here: https://gist.github.com/ktrutmann/e8c5b47f8467ac7f60e56412ab899cdd

Hope this helps anyone else runing into this issue.

ktrutmann avatar Mar 30 '21 13:03 ktrutmann

Hi there,

I'm also seeing this error, but possibly for a different reason. I'm doing a simulation study in which I estimate Bayes factors repeatedly a given dataset and model (i.e. I fit the brms-model repeatedly to the same data). On most simulation runs this works like a charm, but in a few runs I get the above error. I don't save the samples but immediately call bridge_sampling() in the same session.

Reproducible example

The model was fit using cmdstanr, so this requires a recent version of brms. I use:

 package        * version    date       lib source        
 brms           * 2.16.1     2021-08-23 [1] CRAN (R 4.1.1)

(If I remember correctly, I also sometimes got this error using the rstan backend, but I'm not entirely sure at this point.)

Download the stanfit-object here.

library("brms")
mod <- readRDS("brms_samples.rds")
bridge_sampler(mod)
Error in out[!from@positive] <- -out[!from@positive] : 
  NAs are not allowed in subscripted assignments
In addition: Warning messages:
1: effective sample size cannot be calculated, has been replaced by number of samples. 
2: 1 of the 60000 log_prob() evaluations on the posterior draws produced -Inf/Inf. 
3: 100 of the 60000 log_prob() evaluations on the proposal draws produced -Inf/Inf. 
Error: Bridgesampling failed. Perhaps you did not set 'save_pars = save_pars(all = TRUE)' when fitting your model?

Debugging attempt

I've done some digging, but got stuck when I hit rstan code. I'll outline my attempts to trace this back to its origin below. At this point, I'm not even sure, this is a bridgesampling issue. If you can confirm that it's either a rstanor brms issue, I'll be more than happy to raise this at the respective repository.

I have traced the error to a call of bridge_sampler.stanfit(). Specifically, it seems to originate in as.numeric.brob(). It appears that the following lines are the source:

https://github.com/quentingronau/bridgesampling/blob/1b78f76b0536e61a936d24adf9323c797482673e/R/bridge_sampler_internals.R#L270-L271

Further digging shows that this line produces one NA:

https://github.com/quentingronau/bridgesampling/blob/1b78f76b0536e61a936d24adf9323c797482673e/R/bridge_sampler_internals.R#L234

which in turn yields an NA here as well:

https://github.com/quentingronau/bridgesampling/blob/1b78f76b0536e61a936d24adf9323c797482673e/R/bridge_sampler_internals.R#L249

The NA is caused by calculating the difference of two -Inf.

> q11[which(is.na(l1))]
[1] -Inf

> q12[which(is.na(l1))]
[1] -Inf

I suppose it's expected to see -Inf in q11, but, if I understand correctly, q12 originates directly from this call:

https://github.com/quentingronau/bridgesampling/blob/1b78f76b0536e61a936d24adf9323c797482673e/R/bridge_sampler_normal.R#L40

Now, it seems samples_4_iter is constructed from the unconstrained samples.

https://github.com/quentingronau/bridgesampling/blob/1b78f76b0536e61a936d24adf9323c797482673e/R/bridge_sampler.R#L221

Comparing the original to the unconstrained samples, I think, shows the problem: Unconstraining the samples produces an infinite value.

> which(is.infinite(ex), arr.ind = T)
     iterations chains parameters

> which(is.infinite(upars), arr.ind = T)
         iterations chains
[1,] 131      14398      3

> upars[which(is.infinite(upars), arr.ind = T)]
[1] Inf

Inf yields a density for the multivariate normal of 0, i.e. a log density of -Inf. As an aside, I think, it is also this infinite value that causes the calculation of the effective sample size to fail.

I'm not sure I understand exactly what happens here, but I think all this happens in rstan::unconstrain_pars(), because from what I can tell the generation of skeleton, as well as .rstan_relist() work as intended, whereas the unconstrained samples appear to be missing a third of the parameters. Is this expected behavior?

> str(samples@par_dims[samples@sim$pars_oi])
List of 12
 $ b_Intercept: int(0) 
 $ b          : int 1
 $ sd_1       : int 2
 $ cor_1      : int 1
 $ Intercept  : int(0) 
 $ r_1        : int [1:2] 63 2
 $ r_1_1      : int 63
 $ r_1_2      : int 63
 $ lp__       : int(0) 
 $ z_1        : int [1:2] 2 63
 $ L_1        : int [1:2] 2 2
 $ Cor_1      : int [1:2] 2 2
 
 > dim(samples)[3]
[1] 393
 
 > str(lapply(skeleton, dim))
List of 12
 $ b_Intercept: NULL
 $ b          : int 1
 $ sd_1       : int 2
 $ cor_1      : int 1
 $ Intercept  : NULL
 $ r_1        : int [1:2] 63 2
 $ r_1_1      : int 63
 $ r_1_2      : int 63
 $ lp__       : NULL
 $ z_1        : int [1:2] 2 63
 $ L_1        : int [1:2] 2 2
 $ Cor_1      : int [1:2] 2 2

> sum(sapply(lapply(skeleton, dim), prod))
[1] 393

> str(lapply(.rstan_relist(ex[1,1, ], skeleton), dim))
List of 12
 $ b_Intercept: NULL
 $ b          : int 1
 $ sd_1       : int 2
 $ cor_1      : int 1
 $ Intercept  : NULL
 $ r_1        : int [1:2] 63 2
 $ r_1_1      : int 63
 $ r_1_2      : int 63
 $ lp__       : NULL
 $ z_1        : int [1:2] 2 63
 $ L_1        : int [1:2] 2 2
 $ Cor_1      : int [1:2] 2 2
 
> sum(sapply(lapply(.rstan_relist(ex[1,1, ], skeleton), dim), prod))
[1] 393
> str(ex)
 num [1:20000, 1:6, 1:393] -1.82 -2.13 -1.99 -2.26 -1.83 ...
 - attr(*, "dimnames")=List of 3
  ..$ iterations: NULL
  ..$ chains    : chr [1:6] "chain:1" "chain:2" "chain:3" "chain:4" ...
  ..$ parameters: chr [1:393] "b_Intercept" "b_drm_lure" "sd_id__Intercept" "sd_id__drm_lure" ...

> str(upars)
 num [1:131, 1:20000, 1:6] -0.232 -1.94 -0.51 -1.531 -0.787 ...
 - attr(*, "dimnames")=List of 3
  ..$           : NULL
  ..$ iterations: NULL
  ..$ chains    : chr [1:6] "chain:1" "chain:2" "chain:3" "chain:4" ...

crsh avatar Aug 29 '21 21:08 crsh

I found some time to do a little more digging. I now understand that rstan::unconstrain_pars() returns only model parameters (no transformed parameters, etc.). I think this accounts for the reduced number of parameters compared to the original samples (131 vs. 393). So I tried to find out, which of the unconstrained parameters causes the problem. Because the object returned by rstan::unconstrain_pars() does not give parameter names, I manually unconstrained the parameters and, I think, found the culprit to be the correlation between random intercept and slope (the link to download the fit is in my previous post):

ibrary("rstan")
library("brms")
library("bridgesampling")

brms_samples <- readRDS("brms_samples.rds")

# brms internal code
samples <- brms:::restructure(brms_samples)
samples <- brms:::update_misc_env(samples)
samples <- samples$fit
stanfit_model <- samples

# bridgesampling internal code
ex <- rstan::extract(samples, permuted = FALSE)
skeleton <- bridgesampling:::.create_skeleton(samples@sim$pars_oi,
                                              samples@par_dims[samples@sim$pars_oi])
upars <- apply(ex, 1:2, FUN = function(theta) {
  rstan::unconstrain_pars(stanfit_model, bridgesampling:::.rstan_relist(theta, skeleton))
})

ex2 <- rstan::extract(samples, c("b", "Intercept", "sd_1", "z_1", "L_1"), permuted = FALSE)
> which(is.infinite(upars), arr.ind = T)
         iterations chains
[1,] 131      14398      3

> upars[131, 1, 1]
 chain:1 
0.734979

> atanh(ex2[1, 1, 132, drop = FALSE])
, , parameters = L_1[2,1]

          chains
iterations  chain:1
      [1,] 0.734979

So the Inf is caused by a MCMC sample for the correlation that is exactly 1:

> ex2[14398, 3, 132]
[1] 1

> atanh(ex2[14398, 3, 132])
[1] Inf

I suppose this issue could be hot-fixed in bridgesampling by replacing Infs in unconstrained correlations by something like atanh(0.9999999999999999444439999), but this seems very undesirable. Moreover, if I undestand correctly, this appears to be a numerical issue that could similarly arise when a positively constrained parameter is sampled to be 0 (which would yield -Inf in the unconstrained samples). So, now I wonder where this issue is best addressed in brms or rstan? Maybe @paul-buerkner would know?

crsh avatar Sep 06 '21 15:09 crsh

I don't know to be honest. Since the main functions are in rstan, I guess it would be better handled there.

paul-buerkner avatar Sep 06 '21 17:09 paul-buerkner