estimatr
estimatr copied to clipboard
lh_test mishandling joint tests?
Correct behavior for a joint test:
data <- data.frame(X1 = rnorm(10),
X2 = rnorm(10),
X3 = rnorm(10),
Y = rnorm(10))
lm(Y ~ X1+X2+X3, data = data) %>%
car::linearHypothesis(c("X1", "X2"))
Produces a single test of the joint hypothesis:
> lm(Y ~ X1+X2+X3, data = data) %>%
+ car::linearHypothesis(c("X1", "X2"))
Linear hypothesis test
Hypothesis:
X1 = 0
X2 = 0
Model 1: restricted model
Model 2: Y ~ X1 + X2 + X3
Res.Df RSS Df Sum of Sq F Pr(>F)
1 8 4.86
2 6 4.68 2 0.178 0.11 0.89
However lh_robust produces:
> lh_robust(Y ~ X1+X2+X3, data = data,
+ linear_hypothesis = c("X1", "X2"))
$lm_robust
Estimate Std. Error t value Pr(>|t|) CI Lower CI Upper DF
(Intercept) 0.24 0.3 0.79 0.5 -0.5 1.0 6
X1 0.01 0.2 0.05 1.0 -0.6 0.6 6
X2 0.11 0.1 0.74 0.5 -0.2 0.5 6
X3 -0.08 0.2 -0.35 0.7 -0.6 0.5 6
$lh
Estimate Std. Error t value Pr(>|t|) CI Lower CI Upper DF
X1 0.0125 0.243 0.0514 0.961 -0.581 0.606 6
X2 0.1091 0.147 0.7439 0.485 -0.250 0.468 6
-
The issue appears to be that
return_lh_robustre-calculates a series of separate tests fromattr(car_lht, "value"))andsqrt(diag(attr(car_lht, "vcov"))even if these are not requested. -
May be enough to flag in documentation that printed output is for single hypotheses and not for joint hypotheses when these are requested; (as noted) the original output is available as an attribute on the lh object.
May be enough to flag in documentation that printed output is for single hypotheses and not for joint hypotheses when these are requested; (as noted) the original output is available as an attribute on the lh object.
IIRC (one of) the original intent was to make it easier to test different pairs of levels in a factor w/o having to refit, which usually takes the form
lh_robust(Y ~ X1+X2+X3, data = data, linear_hypothesis=c("1*X1- 1*X2 = 0"))
I generally feel confident about that use case.
But it is definitely worth making the documentation clearer that the omnibus tests that car::linearHypothesis prints out by default are basically ignored, and literally yields one test per row of linear_hypothesis.
And to illustrate why/how it is different, for example:
> lm(Y ~ X1+X2+X3, data = data) %>% car::linearHypothesis(c("X1=0", "X2=0"))
Linear hypothesis test
Hypothesis:
X1 = 0
X2 = 0
Model 1: restricted model
Model 2: Y ~ X1 + X2 + X3
Res.Df RSS Df Sum of Sq F Pr(>F)
1 8 11.3058701
2 6 8.6222509 2 2.6836192 0.93373 0.44356
> lm(Y ~ X1+X2+X3, data = data) %>% car::linearHypothesis(c("X1+X2=0"))
Linear hypothesis test
Hypothesis:
X1 + X2 = 0
Model 1: restricted model
Model 2: Y ~ X1 + X2 + X3
Res.Df RSS Df Sum of Sq F Pr(>F)
1 7 11.2911837
2 6 8.6222509 1 2.6689328 1.85724 0.22187
> lh_robust(Y ~ X1+X2+X3, data = data, linear_hypothesis= "X1+X2=0")
$lm_robust
Estimate Std. Error t value Pr(>|t|) CI Lower CI Upper DF
(Intercept) -0.4563304929 0.5034293555 -0.9064439488 0.3996380783 -1.688177749 0.7755167633 6
X1 -0.2313394433 0.5478439885 -0.4222724866 0.6875394457 -1.571865391 1.1091865047 6
X2 -0.4277196300 0.4822974013 -0.8868379320 0.4093025385 -1.607858857 0.7524195971 6
X3 -0.2169584712 0.6851843731 -0.3166424684 0.7622420790 -1.893544234 1.4596272917 6
$lh
Estimate Std. Error t value Pr(>|t|) CI Lower CI Upper DF
X1+X2=0 -0.6590591 0.3989217 -1.652101 0.1496022 -1.635185 0.3170671 6
Also, this comment gives me pause: https://github.com/DeclareDesign/estimatr/blob/b734297abfbf51cef3c0e271fbffc1a81ed8e06f/R/estimatr_lh_robust.R#L81