Revisions to prediction with lm_lin()
Currently prediction does not work for lm_lin() with multi-valued or factorial treatments. This is because of how predict.lm_robust() handles generation of the lin estimator model matrix with new data. The treatment name saved in the lm_lin() model object refers to the original variable name, which may have been transformed in the model matrix to multiple columns, causing some disagreement when treatment x covariate interactions are created. The original variable name doesn't exist in the revised model matrix, and/or the new data model matrix doesn't have correct dimensions when multiplied by coefficients.
See here.
For example:
library(estimatr)
set.seed(60637)
N <- 40
dat <- data.frame(
x = rnorm(N, mean = 2.3),
x2 = rpois(N, lambda = 2),
x3 = runif(N)
)
dat$y0 <- rnorm(N) + dat$x
dat$y1 <- dat$y0 + 0.35
dat$y2 <- dat$y0 + 0.55
dat$z_multi <- sample(0:2, size = nrow(dat), replace = TRUE)
dat$z_bin <- 1*(dat$z_multi>0)
dat$y <- (dat$z_multi == 0)*dat$y0 + (dat$z_multi == 1)*dat$y1 + (dat$z_multi == 2)*dat$y2
# Multi-valued numeric treatments with lm_lin; estimation works as expected
lmlin_mult <- lm_lin(y ~ z_multi, covariates = ~ x, data = dat)
# prediction does not
predict(lmlin_mult, newdata = dat)
# Error in X[, !beta_na, drop = FALSE] :
# (subscript) logical subscript too long
# Binary factorial treatment with lm_lin; estimation works,
lmlin_bin_f <- lm_lin(y ~ as.factor(z_bin), covariates = ~ x + x2 + x3, data = dat)
# prediction breaks
predict(lmlin_bin_f, newdata = dat)
# Error in X[, treat_name] : subscript out of bounds
More detail in gist here
A revision to handle setting up treatment columns in the new data could be implemented in get_X().
Also thank you all for making such a very useful package!