etwfe icon indicating copy to clipboard operation
etwfe copied to clipboard

Add Wild Cluster Bootstrap Support

Open s3alfisc opened this issue 1 year ago • 0 comments

This PR adds support for inference via a wild (cluster) bootstrap by adding a bootstrap argument to etwfe (only for OLS). If bootstrap = TRUE, etwfe will compute marginal effects by calling fwildclusterboot::boot_aggregate(), which is a copy of fixest::aggregate().

It currently depends on a fork of fwildclusterboot, which in itself depends on a fork of fixest by @kylebutts, which introduces support or sparse model matrices. In other words, merging this PR will require another PR to be merged into fixest.

At the moment, this PR simply

  • adds a bootstrap argument to emfx. If bootstrap = TRUE, it will run a wild cluster bootstrap via the fwildclusterboot package
  • in consequence, fwildclusterboot is added as a (soft) dependency in Suggests
  • at the moment, only type = "simple" and the "clustered" bootstrap are supported

The PR still

  • [ ] ...requires @kylebutts's PR to be merged into fixest
  • [ ] ... and fwildclusterboot being updated afterwards
  • [ ] ... lacks support for the heteroskedastic bootstrap
  • [ ] .... lacks some defensive checks
  • [ ] ... lacks unit tests
  • [ ] ... lacks documentation in the vignette
  • [ ] I will also have to revert all changes to etwfe (it's only white space changes, sorry about that).

It is also worth discussing how to unify the output, i.e. running marginaleffects will return a marginaleffects object, while running the bootstrap will simply return a data.frame.

Here is some example code:

library(devtools)
install_github("https://github.com/s3alfisc/fwildclusterboot/tree/etwfe-support")
# this should install kyle's fork of fixest, if not, do it manually
#install_github("https://github.com/kylebutts/fixest/tree/sparse-matrix")

library(etwfe)
library(fwildclusterboot)
data("mpdta", package="did")

mod = etwfe(
  fml = lemp ~ lpop,
  tvar  = year,
  gvar = first.treat,
  data = mpdta,
  #se = "hetero",
  vcov = ~countyreal, 
  ssc = fixest::ssc(adj = FALSE, cluster.adj = FALSE)
)
#names(coef(mod))

emfx(mod)
# Term                 Contrast .Dtreat Estimate Std. Error
# .Dtreat mean(TRUE) - mean(FALSE)    TRUE  -0.0506     0.0124
# z Pr(>|z|)    S  2.5 %  97.5 %
#   -4.08   <0.001 14.4 -0.075 -0.0263
emfx(mod, bootstrap = TRUE, B = 99999, nthreads = 2)
# Run the wild bootstrap: this might take some time...(but hopefully not too much time =) ).
# |======================================================| 100%        Estimate   t value    Pr(>|t|)     [0.025%     0.975%]
# [1,] -0.05062703 -4.078845 6.00006e-05 -0.07550813 -0.02580929
# Warning messages:
#   1: In emfx(mod, bootstrap = TRUE, B = 99999, nthreads = 2) :
#   The bootstrap does not support the ssc() argument `fixef.K='none'`. Using `fixef.K='none' instead. This will lead to a slightly different non-bootstrapped t-statistic`, but will not affect bootstrapped p-values and CIs.
# 2: Matrix inversion failure: Using a generalized inverse instead.
# Check the produced t-statistic, does it match the one of your
# regression package (under the same small sample correction)? If
# yes, this is likely not something to worry about. 

@jtorcasso fyi

s3alfisc avatar Jul 09 '23 10:07 s3alfisc