dowhy icon indicating copy to clipboard operation
dowhy copied to clipboard

CATE: Continuous Outcome and Binary/Categorical Variables

Open kangqiao-ctrl opened this issue 4 years ago • 2 comments

Hi,

I'm working on getting CATE from a dataset with mixed data types. For instance: a binary treatment, two categorical effect modifiers including a binary one, and a continuous outcome.

I started with Linear Regression since the dataset is really simple and the causal relationship is quite straightforward as well. The result is not bad. I did get something interesting.

Here is a small issue. I noticed that dowhy will always try to treat all the variables as continuous whenever possible. For instance, it will qcutmy binary effect modifier and generate five (-0.001,1] intervals. But I prefer to consider True and False as two separate conditions. I checked causal_estimator, and noticed that anything passed is_numeric_dtype() will be qcut.

My question is: is there any way for me to use linear regression to calculate CATE with my original categorical data (skip the qcut step and directly groupby )? Or it can only be done through ML algorithms in EconML or CausalML?

Another side question: In the comment for target_unit, it implies that the parameter can be used as "the condition" for calculating CATE, by passing through a lambda function. I also noticed this usage in the CATE notebook. Could you please explain it a little further?

P.S. I've been using DoWhy for half a year now. Thanks for the great work.

kangqiao-ctrl avatar Apr 06 '21 04:04 kangqiao-ctrl

Thank you for raising this @kangqiao-ctrl. Yeah, you are right about the qcut functionality. It should not be happening for binary variables. Let me try to add a fix and update you here. (thanks for your patience, btw. Was away for the last few weeks.)

amit-sharma avatar Apr 21 '21 07:04 amit-sharma

what if i want to specified an fixed number of qcut? which part of code should I change or what should I do?

wxl112 avatar Nov 18 '22 10:11 wxl112