did2s
did2s copied to clipboard
Common Problems
A quick summary of problems I can link to people
Triple Differences
The following is the standard triple-difference estimator (e.g. Angrist and Pischke, 2008, p.181):
$$ Y_{i g \ell t} = \gamma_{\ell t} + \lambda_{g t} + \theta_{g \ell} + \tau_{g \ell t} D_{g \ell t} + \varepsilon_{i g \ell t}, $$
where $i$ is the individual observation, $\ell$ indexes regions, $t$ indexes time, and $g$ indicates within region groups (e.g. male/female, age groups, affected/unaffected by treatment). The fixed effects include region-specific time fixed effects (common across groups), group-specific time fixed effects (common across regions), and group-region fixed effects (common across time).
To implement this in did2s, you can specify the correct first_stage formula:
first_stage = ~ 0 | region^time + group^time + group^region
Then, for example, a second_stage formula with the treatment dummy ($D_{g \ell t}$) will estimate the average treatment effect
Big Data / Matrix Problems
The main pain point in the code is calculating analytic standard errors. The formula for them requires me to store, in memory, the full (sparse) matrix including fixed effects.
Two sources of problems:
-
I had to write a bespoke function that makes this matrix from a fixest estimate (fixest makes estimation super fast). This can be buggy. I have a new version written with the help of @lrberge that is more robust. You can try it by installing the github version of the package:
remotes::install_github(“kylebutts/did2s”) -
I store the matrix of fixed effects as a sparse matrix (since most values of a unit/time fixed effect is zeros). This saves a bunch of memory, but sometimes, even a sparse matrix is too large to hold in memory. Either many fixed effects or many observations
In either case, if your code faults, you can add the function parameter bootstrap = TRUE. In this case, it won’t calculate the standard error analytically. An example:
library(did2s)
did2s(df_het,
yname = "dep_var", first_stage = ~ 0 | state + year,
second_stage = ~i(treat, ref=FALSE), treatment = "treat",
cluster_var = "state", bootstrap = T, n_boostraps = 250
)
#> Running Two-stage Difference-in-Differences
#> • first stage formula `~ 0 | state + year`
#> • second stage formula `~ i(treat, ref = FALSE)`
#> • The indicator variable that denotes when treatment is on is `treat`
#> • Standard errors will be block bootstrapped with cluster `state`
#> • Starting 250 bootstraps at cluster level: state
#> OLS estimation, Dep. Var.: dep_var
#> Observations: 46,500
#> Standard-errors: Custom
#> Estimate Std. Error t value Pr(>|t|)
#> treat::TRUE 2.15221 0.046291 46.4928 < 2.2e-16 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> RMSE: 1.41487 Adj. R2: 0.337905
Created on 2022-05-20 by the reprex package (v2.0.1)