econtools
econtools copied to clipboard
Differences in F-scores with Stata when using clustering
It seems that the largest discrepancies between the Stata outputs and econtools are when the clustering option is used. On my dataset, I get perfect replicability of Stata for the command:
areg y X, absorb(alpha)
However, differences emerge in t and F values for the line
areg y X, absorb(alpha) cluster(alpha)
on the same dataset.
Thanks for the heads up! It's probably a degrees of freedom issue. I'll look into it when I get the chance.
Hi! Thanks for writing econtools
. To make it easier to identify the discrepancy, I've written a .do
file and .py
file below that demonstrates different output from areg
, xtreg
, reghdfe
and econtools
. I've also shown how to reconcile the differences (though there still seems to be a small difference with econtools
).
The key difference appears to be the finite-sample modifications. In particular, a discrepancy arises when the clusters are nested within the fixed-effects. This is discussed in the reghdfe
FAQ.
xtreg
and reghdfe
appear to be identical in this case. econtools
appears to be very close to xtreg
/reghdfe
. Based on my understanding, the results from xtreg
/reghdfe
/econtools
are preferable to areg
when clusters are nested within the fixed-effects.
econtools_example.do
use "https://www.kellogg.northwestern.edu/faculty/petersen/htm/papers/se/test_data.dta"
save "test_data.dta", replace
replace x = x / 100
*------*
* areg *
*------*
areg y x, absorb(firmid) vce(cluster firmid)
matrix define areg_V = e(V)
*-------*
* xtreg *
*-------*
xtset firmid
xtreg y x, fe vce(cluster firmid)
matrix define xtreg_V = e(V)
xtreg y x, fe vce(cluster firmid) dfadj
matrix define xtreg_dfadj_V = e(V)
*---------*
* reghdfe *
*---------*
reghdfe y x, absorb(firmid) vce(cluster firmid)
matrix define reghdfe_V = e(V)
local G = `e(N_clust)'
local N = `e(N_full)'
local K = `e(rank)'
matrix areg_to_reghdfe_V = areg_V * (`N' - `K' - `G') / (`N' - `K' - 1)
*-----------*
* econtools *
*-----------*
matrix areg_to_econtools_V = areg_V * (`N' - `K' - `G') / (`N' - `K')
* areg
matrix list areg_V
* symmetric areg_V[2,2]
* x _cons
* x 100950.97
* _cons -.05422367 2.913e-08
* xtreg
matrix list xtreg_V
* symmetric xtreg_V[2,2]
* x _cons
* x 90872.034
* _cons -.04880999 2.622e-08
* xtreg with dfadj => areg
matrix list xtreg_dfadj_V
* symmetric xtreg_dfadj_V[2,2]
* x _cons
* x 100950.97
* _cons -.05422367 2.913e-08
* reghdfe
matrix list reghdfe_V
* symmetric reghdfe_V[2,2]
* x _cons
* x 90872.034
* _cons -.04880999 2.622e-08
* convert areg to reghdfe
matrix list areg_to_reghdfe_V
* symmetric areg_to_reghdfe_V[2,2]
* x _cons
* x 90872.034
* _cons -.04880999 2.622e-08
* convert areg to econtools
matrix list areg_to_econtools_V
* symmetric areg_to_econtools_V[2,2]
* x _cons
* x 90853.856
* _cons -.04880022 2.621e-08
econtools_example.py
import pandas as pd
import econtools
import econtools.metrics as mt
# Read Stata .dta file
test_data = econtools.read("test_data.dta")
test_data["x"] *= 1 / 100
# Estimate OLS regression with fixed-effects and clustered s.e.'s
result = mt.reg(test_data, "y", "x", fe_name="firmid", cluster="firmid")
print(result.vce)
# x
# x 90853.85922