Method in diagnostics tutorial over-rejects

Open mollyow opened this issue 6 months ago • 0 comments

This issue is about a method proposed in the tutorial, Evaluating a causal forest fit.

One heuristic method to detect heterogeneity described in the tutorial over-rejects under the null, I believe due to the winner's curse. (The same model is used to determine subgroups and implement estimation). Cross-fitting resolves this problem, and I have proposed a small modification to the tutorial that provides this suggestion in a pull-request (#1502 ).

Description of the bug

tau.hat <- predict(cf)$predictions
high.effect <- tau.hat > median(tau.hat)
ate.high <- average_treatment_effect(cf, subset = high.effect)
ate.low <- average_treatment_effect(cf, subset = !high.effect)
ate.high[["estimate"]] - ate.low[["estimate"]] +
  c(-1, 1) * qnorm(0.975) * sqrt(ate.high[["std.err"]]^2 + ate.low[["std.err"]]^2)
#> [1] 0.6591796 1.0443646

Steps to reproduce Even when the sharp null is true (i.e., treatment is not associated with outcomes), this method rejects at higher than nominal rates.

I used the code from the tutorial, and created a gist showing over-rejection here:

https://gist.github.com/mollyow/c1690ac8fd4a8d333d61cdefeeef82a9

A longer write-up is available here: https://alexandercoppock.com/testing_with_grf.pdf

GRF version 2.4.0 (but it's just about the tutorial, not the underlying code).

Jun 19 '25 22:06 mollyow