estimatr
estimatr copied to clipboard
FE case where clustered se_type = "stata" SEs do not agree with stata
Stata SE's don't agree in the below case with two-way fixed effects when I expected them to. If this is expected feel free to close. Data is RDS saved from R from repl.dta read in via code below. repl.RDS.zip
Agrees w/o fixed effects:
. use repl.dta
. reg conflict dlpi_lev, robust cluster(ccode2)
Linear regression Number of obs = 2,425
F(1, 96) = 10.80
Prob > F = 0.0014
R-squared = 0.0021
Root MSE = .38377
(Std. Err. adjusted for 97 clusters in ccode2)
------------------------------------------------------------------------------
| Robust
conflict | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
dlpi_lev | -.0182238 .005546 -3.29 0.001 -.0292326 -.0072151
_cons | .1827658 .0307032 5.95 0.000 .1218203 .2437113
------------------------------------------------------------------------------
repl <- haven::read_dta("repl.dta")
lm_robust(conflict ~ dlpi_lev, clusters = ccode2, se_type = "stata", data = repl)
Estimate Std. Error t value Pr(>|t|) CI Lower CI Upper DF
(Intercept) 0.18276580 0.030703249 5.952653 4.305921e-08 0.1218203 0.243711268 96
dlpi_lev -0.01822383 0.005546027 -3.285925 1.420160e-03 -0.0292326 -0.007215052 96
Does not agree when we add two way FE:
. tsset ccode2 year
. xtreg conflict dlpi_lev yr*, fe robust cluster(ccode2)
note: yr25 omitted because of collinearity
Fixed-effects (within) regression Number of obs = 2,425
Group variable: ccode2 Number of groups = 97
R-sq: Obs per group:
within = 0.0364 min = 25
between = 0.0027 avg = 25.0
overall = 0.0159 max = 25
F(25,96) = 1.20
corr(u_i, Xb) = 0.0042 Prob > F = 0.2602
(Std. Err. adjusted for 97 clusters in ccode2)
------------------------------------------------------------------------------
| Robust
conflict | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
dlpi_lev | -.0108859 .0133478 -0.82 0.417 -.0373811 .0156093
(year FE omitted)
_cons | .2603708 .0315889 8.24 0.000 .1976673 .3230742
-------------+----------------------------------------------------------------
sigma_u | .29765047
sigma_e | .24593433
rho | .59428563 (fraction of variance due to u_i)
------------------------------------------------------------------------------
lm_robust(conflict ~ dlpi_lev, fixed_effects = ~ year + ccode2, clusters = ccode2, se_type = "stata", data = repl)
Estimate Std. Error t value Pr(>|t|) CI Lower CI Upper DF
dlpi_lev -0.01088586 0.01362317 -0.7990694 0.4262228 -0.03792764 0.01615592 96
There's a stata note yr25 omitted - can you check if we match when you manually drop that year?
~Also check your design is in fact balanced, IIRC we don't yet support imbalanced designs for multiple FEs.~ It looks like all groups have 25 in stata.
~This comment makes me think we have some numerical flakyness in the FE standard errors:~
https://github.com/DeclareDesign/estimatr/blob/a1215e01245a592833106fcb73b833f94bba8fc6/R/estimatr_lm_robust.R#L67-L70
Looking at the #s in the output, is there a difference in a denominator somewhere relating to n within groups ? eg:
> (.0133478 / 0.01362317)**2
[1] 0.9599818656
> 24/25
[1] 0.96
Could be coincidence but that's really close. Can you increase the cformat in stata and rerun? Also try excluding a year and see if the ratio changes to 23/24.