Enforce fixed_effects passed as formula (old title: predict.lm_robust not working with fixed_effects)
I've been trying to use predict() on an lm_robust object for which I've included fixed effects, and it seems to keep producing errors.
library(estimatr)
df <- data.frame(y = rnorm(100),
x = rnorm(100),
i = floor(0:99/10),
j = 0:99 %% 10)
#No fixed effects
lmr <- lm_robust(y~x, data=df)
predict(lmr,newdata=df)
#Works fine
#one fixed effect variable
lmr <- lm_robust(y~x, fixed_effects=i, data=df)
predict(lmr,newdata=df)
#Error: 'x' must be a formula
#note the above is not a regular ' but is a back quote
#two fixed effects variables
lmr <- lm_robust(y~x,fixed_effects=i+j,data=df)
predict(lmr,newdata=df)
#Error: object 'i' not found
I've been playing around with it. Clusters and weights don't seem to cause a problem, just fixed effects. And the error given is consistently different depending on how many FE variables there are. There's no indication in help(predict.lm_robust) that it isn't to be used with fixed_effects so I'm wondering if this is a bug. This is using estimatr 0.18.0 downloaded from CRAN.
This is no good. I'm sorry about this! A fix is on the way.
Thank you very much for raising this bug @NickCH-K. You can solve this problem by passing the fixed effects as we intended, using a right hand side formula, like so:
lmr <- lm_robust(y~x, fixed_effects=~i, data=df)
predict(lmr,newdata=df)
The fact that it should be a RHS isn't all that clear, and the fact that it works without the tilde for estimation but not for prediction is problematic. I'm leaving this issue open until I implement a more flexible solution.
Unfortunately, we don't currently support predictions with multiple fixed effects, but we should add that as a new feature.
Ah that does make sense! Thank you.
The fact that it should be a RHS isn't all that clear, and the fact that it works without the tilde for estimation but not for prediction is problematic. I'm leaving this issue open until I implement a more flexible solution.
Should we have add warning on estimation when not using formulas? My intuition is there are probably some nasty NSE / scoping issues that we could head off by requiring formulas.
I think I'd prefer to just enforce that they be passed as formulae.
I see, so something like this? https://github.com/nfultz/estimatr/commit/ce192679ea35cccd63a4a7e1f21142a77209a697