Method in diagnostics tutorial over-rejects
This issue is about a method proposed in the tutorial, Evaluating a causal forest fit.
One heuristic method to detect heterogeneity described in the tutorial over-rejects under the null, I believe due to the winner's curse. (The same model is used to determine subgroups and implement estimation). Cross-fitting resolves this problem, and I have proposed a small modification to the tutorial that provides this suggestion in a pull-request (#1502 ).
Description of the bug
tau.hat <- predict(cf)$predictions
high.effect <- tau.hat > median(tau.hat)
ate.high <- average_treatment_effect(cf, subset = high.effect)
ate.low <- average_treatment_effect(cf, subset = !high.effect)
ate.high[["estimate"]] - ate.low[["estimate"]] +
c(-1, 1) * qnorm(0.975) * sqrt(ate.high[["std.err"]]^2 + ate.low[["std.err"]]^2)
#> [1] 0.6591796 1.0443646
Steps to reproduce Even when the sharp null is true (i.e., treatment is not associated with outcomes), this method rejects at higher than nominal rates.
I used the code from the tutorial, and created a gist showing over-rejection here:
https://gist.github.com/mollyow/c1690ac8fd4a8d333d61cdefeeef82a9
A longer write-up is available here: https://alexandercoppock.com/testing_with_grf.pdf
GRF version 2.4.0 (but it's just about the tutorial, not the underlying code).