did2s icon indicating copy to clipboard operation
did2s copied to clipboard

Common Problems

Open kylebutts opened this issue 3 years ago • 2 comments
trafficstars

A quick summary of problems I can link to people

kylebutts avatar Jan 24 '22 16:01 kylebutts

Triple Differences

The following is the standard triple-difference estimator (e.g. Angrist and Pischke, 2008, p.181):

$$ Y_{i g \ell t} = \gamma_{\ell t} + \lambda_{g t} + \theta_{g \ell} + \tau_{g \ell t} D_{g \ell t} + \varepsilon_{i g \ell t}, $$

where $i$ is the individual observation, $\ell$ indexes regions, $t$ indexes time, and $g$ indicates within region groups (e.g. male/female, age groups, affected/unaffected by treatment). The fixed effects include region-specific time fixed effects (common across groups), group-specific time fixed effects (common across regions), and group-region fixed effects (common across time).

To implement this in did2s, you can specify the correct first_stage formula:

first_stage = ~ 0 | region^time + group^time + group^region

Then, for example, a second_stage formula with the treatment dummy ($D_{g \ell t}$) will estimate the average treatment effect

kylebutts avatar May 20 '22 17:05 kylebutts

Big Data / Matrix Problems

The main pain point in the code is calculating analytic standard errors. The formula for them requires me to store, in memory, the full (sparse) matrix including fixed effects.

Two sources of problems:

  1. I had to write a bespoke function that makes this matrix from a fixest estimate (fixest makes estimation super fast). This can be buggy. I have a new version written with the help of @lrberge that is more robust. You can try it by installing the github version of the package: remotes::install_github(“kylebutts/did2s”)

  2. I store the matrix of fixed effects as a sparse matrix (since most values of a unit/time fixed effect is zeros). This saves a bunch of memory, but sometimes, even a sparse matrix is too large to hold in memory. Either many fixed effects or many observations

In either case, if your code faults, you can add the function parameter bootstrap = TRUE. In this case, it won’t calculate the standard error analytically. An example:

library(did2s)

did2s(df_het, 
    yname = "dep_var", first_stage = ~ 0 | state + year, 
    second_stage = ~i(treat, ref=FALSE), treatment = "treat", 
    cluster_var = "state", bootstrap = T, n_boostraps = 250
)
#> Running Two-stage Difference-in-Differences
#> • first stage formula `~ 0 | state + year`
#> • second stage formula `~ i(treat, ref = FALSE)`
#> • The indicator variable that denotes when treatment is on is `treat`
#> • Standard errors will be block bootstrapped with cluster `state`
#> • Starting 250 bootstraps at cluster level: state
#> OLS estimation, Dep. Var.: dep_var
#> Observations: 46,500 
#> Standard-errors: Custom 
#>             Estimate Std. Error t value  Pr(>|t|)    
#> treat::TRUE  2.15221   0.046291 46.4928 < 2.2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> RMSE: 1.41487   Adj. R2: 0.337905

Created on 2022-05-20 by the reprex package (v2.0.1)

kylebutts avatar May 20 '22 17:05 kylebutts