grf
grf copied to clipboard
GATES and CLAN features
Chernozhukov et al's paper, develops Sorted Group Average Treatment Effects (GATES) and Classification Analysis (CLAN) in addition to the Baseline Linear Predictor (BLP) to put bounds on average treatment effect estimation by groups induced by the random forest predictor (see page 6).
Are there any plans to make measures like GATES and CLAN available in order to determine ATE estimates and standard errors for individual quantile-groups of tau.hat?
I am currently attempting to measure the ATE and a standard-deviation for the top quantile of tau.hat via:
cf.tau.hat <- predict(cf.tau.forest, cf.X.test[, selected.vars], estimate.variance = TRUE)
# select top quartile (for example)
cf.top.select <- which(cf.tau.hat$predictions > quantile(cf.tau.hat$predictions)[4])
# output average effect for top quartile selection as well as average standard deviation
mean(cf.tau.hat$predictions[cf.top.select])
mean(sqrt(cf.tau.hat$variance.estimates[cf.top.select]))
Thank you.
So first -- we actually already implement the BLP test in the function test_calibration (but haven't really advertised it). If you use this, I'd recommend installing the latest development version from github (or waiting for version 0.10.2, which will hopefully be on CRAN within a week or so).
For estimating ATEs of quantile buckets: Here the tricky thing is that, with forests, it's much more natural to do out-of-bag (i.e., leave-one-out) prediction rather than k-fold prediction. For most applications, this is not a problem; however, with GATES, I'd like to see a little more work on what happens if we define quantiles using leave-one-out predictions rather than across folds like they do in Chernozhukov et al.
If you want to run experiments, the average_treatment_effect function has a subset argument that can be used to get doubly robust ATE estimates over arbitrary subgroups. This procedure is formally justified if the subsetting rule was exogenously defined using the X; however, it might be interesting to run a simulation study with subsets defined by taking quantiles of out-of-bag CATE estimates.
Thanks for your feedback. I was actually motivated to ask about GATES and CLAN after seeing the BLP implementation in test_calibration. I considered using the average_treatment_effect function with the subset argument, but then saw the warning notes in the function definition that mention the subsetting restriction based on X only (not W or Y).
For a very simple toy model, if I estimate ATE using average_treatment_effect and the subset argument to select a particular range of responses, it does look like it provides results in the expected range:
library(grf)
# Generate data.
n = 2000; p = 10
X = matrix(rnorm(n*p), n, p)
X.test = matrix(0, 101, p)
X.test[,1] = seq(-2, 2, length.out = 101)
# Train a causal forest.
W = rbinom(n, 1, 0.5) # simple random assignment
Y = pmax(X[,1], 0) * W # response only depends on X[,1]
tau.forest = causal_forest(X, Y, W)
# Estimate treatment effects for the test sample and plot them vs. theoretical.
tau.hat = predict(tau.forest, X.test)
plot(X.test[,1], tau.hat$predictions, ylim = range(tau.hat$predictions, 0, 2), xlab = "x", ylab = "tau", type = "l")
lines(X.test[,1], pmax(0, X.test[,1]), col = 2, lty = 2)
# estimate ATE for the subset where X[,1] is in the range 1.5 and 1.55
forest.select <- which(X[,1] > 1.5 & X[,1]<1.55)
average_treatment_effect(tau.forest, subset = forest.select, target.sample = "all")
Of course, I'm not sure if this generalizes to more complex scenarios.
What I'm really curious about is whether something like the snippet below can be rigorously justified... If it can, then it'd make a very nice omnibus test for heterogeneity that could be used in parallel with the BLP test. I haven't had time to look into the formal details of this yet, though.
tau.hat = predict(cf)$predictions
high_effect = tau.hat > median(tau.hat)
ate.high = average_treatment_effect(cf, subset = high_effect)
ate.low = average_treatment_effect(cf, subset = !high_effect)
paste("95% CI for difference in ATE:",
round(ate.high[1] - ate.low[1], 3), "+/-",
round(qnorm(0.975) * sqrt(ate.high[2]^2 + ate.low[2]^2), 3))
@swager @jtibshirani Is there any plan of adding the GATES functionality to the grf package? I am trying to find out the subgroup that is affected the most by the treatment. I can either do a brute force search (i.e. estimate the ATE for many sub-groups and find the maximum), or use a more systematic way like GATES. Do you see any problem in the brute-force search though?
@ginward if you want a simple description of the heterogeneity, here are two options:
- You could look for the best pruned tree in the forest, using the script in #281
- You could look for the best tree-shaped decision rule, following https://arxiv.org/abs/1702.02896. We hope to have a public GRF-based implementation of the latter available soon.
Thanks @swager Is there a reason that the best pruned tree in the forest is not merged into the master branch yet?
Thanks @swager Is there a reason that the best pruned tree in the forest is not merged into the master branch yet?
@ginward the best pruned tree requires more work before it is ready to merge, but wrt. finding the best tree-shaped decision rule, this functionality is now available in the sister package https://github.com/grf-labs/policytree
While GATES/CLAN looks like an interesting addition, we have no immediate plans of turning it into a feature. A "GRF native" alternative which can serve a similar purpose is RATE: https://grf-labs.github.io/grf/reference/rank_average_treatment_effect.html (available in v2.1.0+)