EconML
EconML copied to clipboard
How to use known treatment probabilities in doubly robust learners
@kbattocchi
Hi Keith,
How would you recommend handling a case where we know the true treatment probabilities? I'd prefer to use them to avoid having to fit the model_propensity (in a doubly robust model, say ForestDRLearner).
A few options:
- Pass the (inverse) probabilities as
sample_weight
tofit
. But then we need to choose something formodel_propensity
, perhaps just a dummy classifier? - Create a trivial
model_propensity
that takes the probability as a feature and returns the same probability. But then we need some workaround to prevent themodel_regression
from using the probability as a feature (by DRLearner will always pass X, W to both model_propensity and model_regression). Maybe we can use a sklearn pipeline with transformer for this.
Thanks! Kyle
If the probabilities are the same for every instance, then I'd just use sklearn.dummy.DummyClassifier()
, which uses the 'prior'
strategy by default and thus will output the empirical probability as the result of predict_proba
.
If the probabilities vary but are known, I think adding them as the last column of W and then using make_column_transformer(('passthrough', -1))
as your propensity model should be fine, even without doing anything to filter that column from the regression model's input (without giving it a ton of thought I can't see how knowing the true probability should bias the regression). But if you really want to, then it should also be possible to use a column transformer that drops the column instead of passing it through and pipelining that transformer with your real model if you want (but note that we concatenate X, W, and one-hotted-first-column-removed T as inputs to the regression model so the column to drop is no longer the very last one).
This is for varying probabilities. Using make_column_transformer
and W is helpful (to avoid having the probabilities in the CATE model, which doesn't make sense).
@kyleco @kbattocchi Sounds exactly like the problem I'm having; do you have a code snippet available where you implemented the make_column_transformer(('passthrough', -1))
workaround? I want to include varying, but known treatment probabilities into a ForestDRLearner
; thank you!