bayesplot
bayesplot copied to clipboard
Error using posterior_predict() or pp_check() on a rstanarm::stan_gamm4() model with binomial family
Hello,
There seems to be a tiny bug when using pp_check on a model fitted using rstanarm::stan_gamm4(... family = binomial(link = "logit")). When including a random effect, it seems like pp_check() and even the underlying posterior_predict() are not able to find the response variable (e.g. y), so they return the error: "Error in cbind(y, 1 - y) : object 'y' not found".
The problem does not occur if omitting the random effect from the stan_gamm4 model. I'm using:
- R version 4.1.0 on macos
- rstanarm version 2.21.1
- bayesplot version 1.8.0
Here's a reproducible example:
library(bayesplot)
library(rstanarm)
# add a fake grouping variable to the data
data("mtcars")
mtcars$group <- factor(rep(letters[1:10],100)[1:nrow(mtcars)])
# fit a GAMM (with random effect)
mod.gamm <- stan_gamm4(cbind(am,1-am) ~ s(wt),
random = ~ (1|group),
family = binomial(link = "logit"), data = mtcars,
cores = 4, seed = 321321)
posterior_predict(mod.gamm)
pp_check(mod.gamm)
# fit a GAM (no random effect)
mod.gam <- stan_gamm4(cbind(am,1-am) ~ s(wt),
family = binomial(link = "logit"), data = mtcars,
cores = 4, seed = 321321)
posterior_predict(mod.gam)
pp_check(mod.gam)
I looked into the data frame stored in the model and the response variables have been renamed to " cbind(am, 1 - am).am" and "cbind(am, 1 - am).V2". Adding the variable to the stored data frame seems to fix the problem.
mod.gamm$data$am <- mtcars$am
posterior_predict(mod.gamm)
I'm not sure whether that's a rstanarm issue or bayesplot issue, but the error appeared when using bayesplot function so I'm posting here. I hope that's ok!
Cheers
Just a quick update (issue still not resolved):
Reading the help for posterior_predict(), I see that for models with binomial distributions a two column response must have both variables in the data frame.
Doing so does not resolve the problem unfortunately. The response variables are still renamed in the data frame stored with the model. Again, this is not an issue when the random effect is omitted.
# add the other response variable to the data frame
mtcars$notam <- 1 - mtcars$am
# fit a GAMM (with random effect)
mod.gamm <- stan_gamm4(cbind(am,notam) ~ s(wt),
random = ~ (1|group),
family = binomial(link = "logit"), data = mtcars,
cores = 4, seed = 321321)
I'm also getting this issue.
Thanks for following up. This seems like an rstanarm bug (I see you opened an issue there too, thanks!). Going to close the bayesplot issue and hopefully we can get this sorted out in rstanarm.