EconML icon indicating copy to clipboard operation
EconML copied to clipboard

Binary/discrte outcome for causalForestDML

Open leowu4ever opened this issue 2 years ago • 1 comments

Is there any configuration I need to specify apart from passing a classifier as the model_y if the outcome is binary/discrete?

leowu4ever avatar May 23 '23 10:05 leowu4ever

DML does not have a way to directly model a discrete outcome, so our first-stage fitting logic just calls predict on whatever Y model you have passed in and then subtracts the result from the true outcome. In the binary case (with 0/1 labels), this would always result in a residual in {-1,0,1}, which might work but is probably inferior to using predict_proba instead; the best workaround here would be to just use a regressor rather than a classifier if an equivalent exists (e.g. RandomForestRegressor vs. RandomForestClassifier).

If there are more than two labels, then unfortunately I don't think there's currently a cleaner solution than to preprocess your outcome by one-hot-encoding it and dropping one column (since it's linearly dependent on the others) and then using a regressor as the outcome model, and interpreting the output of effect as the change in the probability of the not-dropped columns (with the change in probability of the dropped column being the negative of the sum of all of the others, so that the total sum in the changes in probabilities is zero).

kbattocchi avatar Jun 08 '23 16:06 kbattocchi