estimatr
estimatr copied to clipboard
df for linear_hypothesis
Did a short test of the lh_robust compared to conducting the test ourselves by subsetting the data to the X == 1 data (the linear_hypothesis is the treatment effect when X == 1). The estimate, std error, and statistic agree exactly (posted a small bug fix on those), but the p-value and CI's are quite different reflecting the wrong degrees of freedom. When N is large, this difference disappears as both converge to a Z statistic.
> library(estimatr)
> dat <- fabricate(
+ N = 40,
+ Y = rpois(N, lambda = 4),
+ X = rbinom(N, 1, 0.5),
+ Z = rbinom(N, 1, 0.5),
+ clusterID = sample(1:4, N, replace = TRUE)
+ )
> fit_lhr <- lh_robust(Y ~ X*Z, linear_hypothesis = "Z + X:Z = 0", data = dat)
> fit_lhr$lh
Estimate Std. Error t value Pr(>|t|) CI Lower CI Upper DF
Z + X:Z = 0 -1.375 0.9363419 -1.468481 0.150658 -3.273989 0.5239893 36
> fit_cate <- lm_robust(Y ~ Z, data = subset(dat, X == 1))
> fit_cate
Estimate Std. Error t value Pr(>|t|) CI Lower CI Upper DF
(Intercept) 4.875 0.7180703 6.789029 4.350066e-06 3.352759 6.3972411 16
Z -1.375 0.9363419 -1.468481 1.613588e-01 -3.359956 0.6099561 16
The multicomp package also will do this kind of test and its solution for inference is using the Z-statistic.
library(multcomp)
fit_lmr <- lm_robust(Y ~ X*Z, data = dat)
mod <- glht(fit_lmr, linfct = "Z + X:Z = 0")
> summary(mod)
Simultaneous Tests for General Linear Hypotheses
Fit: lm_robust(formula = Y ~ X * Z, data = dat)
Linear Hypotheses:
Estimate Std. Error z value Pr(>|z|)
Z + X:Z == 0 -1.3750 0.9363 -1.468 0.142
(Adjusted p values reported -- single-step method)
Note the est, std error, and test statistic are identical but the p value differs since it's treated as a Z value rather than a t-value.
Multcomp has a bug, where it's ignoring the df's you've provided it -
https://github.com/cran/multcomp/blob/fccb9a9b45b222fbab50ec47cf5778d3990ec9e2/R/helpers.R#L136-L139
it is hardcoded for lm and aov.
PS, this weirdness about the DFs is way bigger than just linear hypothesis. In a regression, we're using the same degrees of freedom for every t-test, which means we get different CIs for the same estimate depending on whether we use an interaction or if we "subsample"
library(DeclareDesign)
library(broom)
dat <- fabricate(N = 20,
Y = rnorm(N),
X = rbinom(N, 1, 0.5),
Z = rbinom(N, 1, 0.5))
lm_robust(Y ~ Z + X + Z*X, data = dat)
lm_robust(Y ~ Z, data = subset(dat, X == 0))
# also a problem for lm()
tidy(lm(Y ~ Z + X + Z*X, data = dat), conf.int = TRUE)
tidy(lm(Y ~ Z, data = subset(dat, X == 0)), conf.int = TRUE)
@acoppock I've encountered that at another job - the issue is that with the Z:X interaction term, you need to track the structural sparseness of that column. The bookkeeping gets pretty nasty even with simple models, a la Demand ~ ( Price + lag(Demand) + Trend + AdSpend ) * State * Brand.
Most people just give up at that point.
See also the lme4 FAQs - https://bbolker.github.io/mixedmodels-misc/glmmFAQ.html#why-doesnt-lme4-display-denominator-degrees-of-freedomp-values-what-other-options-do-i-have
hoooo boy. Welp, at least if they diagnose, they'll know that that procedure has CIs that are maybe too small.