econtools icon indicating copy to clipboard operation
econtools copied to clipboard

Differences in F-scores with Stata when using clustering

Open visez opened this issue 4 years ago • 2 comments

It seems that the largest discrepancies between the Stata outputs and econtools are when the clustering option is used. On my dataset, I get perfect replicability of Stata for the command:

areg y X, absorb(alpha)

However, differences emerge in t and F values for the line

areg y X, absorb(alpha) cluster(alpha)

on the same dataset.

visez avatar Mar 06 '20 06:03 visez

Thanks for the heads up! It's probably a degrees of freedom issue. I'll look into it when I get the chance.

dmsul avatar Mar 11 '20 02:03 dmsul

Hi! Thanks for writing econtools. To make it easier to identify the discrepancy, I've written a .do file and .py file below that demonstrates different output from areg, xtreg, reghdfe and econtools. I've also shown how to reconcile the differences (though there still seems to be a small difference with econtools).

The key difference appears to be the finite-sample modifications. In particular, a discrepancy arises when the clusters are nested within the fixed-effects. This is discussed in the reghdfe FAQ.

xtreg and reghdfe appear to be identical in this case. econtools appears to be very close to xtreg/reghdfe. Based on my understanding, the results from xtreg/reghdfe/econtools are preferable to areg when clusters are nested within the fixed-effects.

econtools_example.do

use "https://www.kellogg.northwestern.edu/faculty/petersen/htm/papers/se/test_data.dta"
save "test_data.dta", replace

replace x = x / 100

*------*
* areg *
*------*
areg y x, absorb(firmid) vce(cluster firmid)
matrix define areg_V = e(V)

*-------*
* xtreg *
*-------*
xtset firmid
xtreg y x, fe vce(cluster firmid)
matrix define xtreg_V = e(V)

xtreg y x, fe vce(cluster firmid) dfadj
matrix define xtreg_dfadj_V = e(V)

*---------*
* reghdfe *
*---------*
reghdfe y x, absorb(firmid) vce(cluster firmid)
matrix define reghdfe_V = e(V)
local G = `e(N_clust)'
local N = `e(N_full)'
local K = `e(rank)'

matrix areg_to_reghdfe_V = areg_V * (`N' - `K' - `G') / (`N' - `K' - 1)

*-----------*
* econtools *
*-----------*
matrix areg_to_econtools_V = areg_V * (`N' - `K' - `G') / (`N' - `K') 

* areg
matrix list areg_V
* symmetric areg_V[2,2]
*                 x       _cons
*     x   100950.97
* _cons  -.05422367   2.913e-08

* xtreg
matrix list xtreg_V
* symmetric xtreg_V[2,2]
*                 x       _cons
*     x   90872.034
* _cons  -.04880999   2.622e-08

* xtreg with dfadj => areg
matrix list xtreg_dfadj_V
* symmetric xtreg_dfadj_V[2,2]
*                 x       _cons
*     x   100950.97
* _cons  -.05422367   2.913e-08

* reghdfe
matrix list reghdfe_V
* symmetric reghdfe_V[2,2]
*                 x       _cons
*     x   90872.034
* _cons  -.04880999   2.622e-08

* convert areg to reghdfe
matrix list areg_to_reghdfe_V
* symmetric areg_to_reghdfe_V[2,2]
*                 x       _cons
*     x   90872.034
* _cons  -.04880999   2.622e-08

* convert areg to econtools
matrix list areg_to_econtools_V
* symmetric areg_to_econtools_V[2,2]
*                 x       _cons
*     x   90853.856
* _cons  -.04880022   2.621e-08

econtools_example.py

import pandas as pd
import econtools
import econtools.metrics as mt

# Read Stata .dta file
test_data = econtools.read("test_data.dta")
test_data["x"] *= 1 / 100

# Estimate OLS regression with fixed-effects and clustered s.e.'s
result = mt.reg(test_data, "y", "x", fe_name="firmid", cluster="firmid")

print(result.vce)
#             x
# x  90853.85922

vikjam avatar Dec 11 '20 18:12 vikjam