EconML icon indicating copy to clipboard operation
EconML copied to clipboard

Handle the binary outcome in metalearners

Open fullflu opened this issue 6 years ago • 4 comments

Problem

When we use the ordinary classification models provided by sklearn, we get poor results because metalearners call the predict methods that return 0 or 1.

It would be because metalearners in EconML now focus on the continuous outcome.

Solution

I modified the predict methods in two ways:

  • create wrappers of sklearn models
  • override methods after the initialization of metalearners

I uploaded a jupyter notebook in gist to explain the modification.

Discussion and Proposition

I would like to find better solutions to handle the binary outcome (from my point of view, it would not be reasonable to add an option to handle the binary outcome in EconML right now). Do you have any ideas?

And I would appreciate it if you could provide a supplementary explanation about handling the binary outcome anywhere in your document.

fullflu avatar Jul 28 '19 08:07 fullflu

Thanks for the report.

@moprescu - would it make sense to provide a discrete treatment option for the metalearners like we do for other estimator types?

kbattocchi avatar Jul 30 '19 15:07 kbattocchi

@fullflu We had a brief discussion offline.

You're right that right now the user is responsible for passing regression models rather than classification models when predicting outputs. We have a prototype of a wrapper for classification models very similar to yours (see RegWrapper) but this currently in our prototypes area, not in the main utilities. You're also right that this is something that we should call out in the documentation.

Ultimately, this issue goes beyond the metalearners - it affects our other estimators as well, so we'll need to give some thought to whether we'd like to make some broader changes (like adding a discrete_output=False argument to the initializers of all of our estimators that would automatically handle the wrapping internally). For now we don't have any concrete plans to do this, so manually wrapping your classifier is the way to go, but we'll at least try to make the wrapper part of the main utilities in an upcoming release so that users don't need to create their own.

kbattocchi avatar Jul 30 '19 16:07 kbattocchi

Thank you for your response and I'm sorry for my late response. I understand your opinions and plans.

Does this issue remain now? If the implementation of wrapper of classification models is not trivial, I'll close this issue for now.

fullflu avatar Jul 21 '20 04:07 fullflu

I also encounter this problem in my application of DMLCausalForest, where the estimated CATE is of great magnitudes. I re-check the code and pick out the residuals to see what might happen. To my understandings of DML Causaul Forest,

\delta Y = Y-model_y([X, W]) \delta T = T-model_t([X, W])

Use Causal Forest to estimate \theta(X) s.t. \delta Y = \theta(X) \delta T + \epsilon

So \theta(X) will have a strong correlation with \delta Y / \delta T, if \delta T is a small value near 0 but \delta Y is either 0 or 1, , the estimate will be strange.

To mend this, I modify the econcml>dml>dml._FirstStageWrapper.predict function, to change self._model.predict to self._model.predict_proba()[:,1], if the outcome is fitted via classification models. This would be technically helpful, but not quite sure if it would be theoretically sound.

Tauhmax avatar Nov 25 '21 11:11 Tauhmax