rstanarm icon indicating copy to clipboard operation
rstanarm copied to clipboard

posterior_epred fails for stan_clogit models when factors in training and new data don't have the same levels

Open epiben opened this issue 4 years ago • 0 comments

Summary:

When fitting a stan_clogit model whose training data contain factors, posterior_epred fails if the corresponding covariate in newdata doesn't contain only one level.

Details

I'm fitting a conditional logistic regression to a data set in which each stratum has 1 case and 5 controls, mathed on date of birth +/- 180 days. I use restricted cubic splines for the most interesting continuous covariates; a few categorical covariates are encoded as factors in the data input. When using the posterior_epred function to get conditional effects I've encountered two problems. This issue addresses one: factors in the data argument (in stan_clogit) aren't handled well by posterior_epred when newdata contains one level of those factors.

The problem seems to arise because selection of covariates used for computing the conditional effects is based on number of columns in the data matrix (which is incorrect in the presence of one-level factors) instead of using their names.

Reprex:

Prep work:

library(rstanarm)
library(dplyr)

dat <- arrange(infert, stratum) %>% 
	filter(parity <= 2)

nd <- data.frame(age = 18:45, spontaneous = 0, induced = 0, stratum = "1", 
                 education = "0-5yrs", case = c(1, rep(0, 27)))
nd2 <- data.frame(age = 18:45, spontaneous = 0, induced = 0, stratum = "1", 
                  education = factor("0-5yrs", levels = levels(dat$education)), 
                  case = c(1, rep(0, 27)))
nd3 <- data.frame(age = 18:45, spontaneous = 0, induced = 0, stratum = "1", 
                  education = "6-11yrs", 
                  case = c(1, rep(0, 27)))

WORKS

post <- stan_clogit(case ~ age + spontaneous + induced, strata = stratum,
					data = dat, QR = TRUE)
mu <- posterior_epred(post, newdata = nd) 

FAILS The model fits just fine, but all these calls to posterior_epredfail with the error message Error in stanmat[, beta_sel, drop = FALSE] : subscript out of bounds:

post2 <- stan_clogit(case ~ age + spontaneous + induced + education, strata = stratum,
                     data = dat, QR = TRUE)
posterior_epred(post2, newdata = nd) 
posterior_epred(post2, newdata = nd2) 
posterior_epred(post2, newdata = nd3) 

Versions and OS:

I run rstanarm v2.21.1 in Rstudio with R v3.6.3 on MacOS X 10.15 Catalina.

epiben avatar Sep 11 '20 11:09 epiben