rstanarm
rstanarm copied to clipboard
posterior_epred fails for stan_clogit models when factors in training and new data don't have the same levels
Summary:
When fitting a stan_clogit
model whose training data contain factors, posterior_epred
fails if the corresponding covariate in newdata
doesn't contain only one level.
Details
I'm fitting a conditional logistic regression to a data set in which each stratum has 1 case and 5 controls, mathed on date of birth +/- 180 days. I use restricted cubic splines for the most interesting continuous covariates; a few categorical covariates are encoded as factors in the data
input. When using the posterior_epred
function to get conditional effects I've encountered two problems. This issue addresses one: factors in the data
argument (in stan_clogit
) aren't handled well by posterior_epred
when newdata
contains one level of those factors.
The problem seems to arise because selection of covariates used for computing the conditional effects is based on number of columns in the data matrix (which is incorrect in the presence of one-level factors) instead of using their names.
Reprex:
Prep work:
library(rstanarm)
library(dplyr)
dat <- arrange(infert, stratum) %>%
filter(parity <= 2)
nd <- data.frame(age = 18:45, spontaneous = 0, induced = 0, stratum = "1",
education = "0-5yrs", case = c(1, rep(0, 27)))
nd2 <- data.frame(age = 18:45, spontaneous = 0, induced = 0, stratum = "1",
education = factor("0-5yrs", levels = levels(dat$education)),
case = c(1, rep(0, 27)))
nd3 <- data.frame(age = 18:45, spontaneous = 0, induced = 0, stratum = "1",
education = "6-11yrs",
case = c(1, rep(0, 27)))
WORKS
post <- stan_clogit(case ~ age + spontaneous + induced, strata = stratum,
data = dat, QR = TRUE)
mu <- posterior_epred(post, newdata = nd)
FAILS
The model fits just fine, but all these calls to posterior_epred
fail with the error message Error in stanmat[, beta_sel, drop = FALSE] : subscript out of bounds
:
post2 <- stan_clogit(case ~ age + spontaneous + induced + education, strata = stratum,
data = dat, QR = TRUE)
posterior_epred(post2, newdata = nd)
posterior_epred(post2, newdata = nd2)
posterior_epred(post2, newdata = nd3)
Versions and OS:
I run rstanarm v2.21.1 in Rstudio with R v3.6.3 on MacOS X 10.15 Catalina.