causalml icon indicating copy to clipboard operation
causalml copied to clipboard

heterogeneous treatment effects of treatments

Open abdollahpouri opened this issue 2 years ago • 4 comments

Hello. It is really not a request for a feature neither it is a bug. I was looking for a package to do heterogeneous treatment effects of different treatments and found your repo. I run the feature_interpretations_example.ipynb as this looked the most relevant to what I wanted and actually it is and I loved your repo. However, in that file, it seems you need to build different models

base_algo=LGBMRegressor()

to be able to later estimate feature importances. What I am looking for is something like this: We don't even have the models. We just have data for how they performed in an A/B test. For example, imagine we have a control and a treatment and we test these on some users (users have several features, F1, F2, ...FN). We measure a specific metric for each model on different users and we see some differences between control and treatment for different users. How can we find out the importance of each user feature (F1, F2, ...FN) that may have caused these performance differences? Thanks very much in advance

abdollahpouri avatar Apr 06 '22 13:04 abdollahpouri

Following the meta-learner method, you do need to fit a base algo to be able to find the feature importance. Based on you description, you can try to follow the notebook you mentioned. (Use F1, F2, ... FN as input feature of meta learner)

vincewu51 avatar May 22 '22 03:05 vincewu51

Thank you. That was very helpful. How can I estimate the treatment effect on users whose F1 feature is less than a certain value? For example, imagine I group the users into 2 groups based on F1 (F1 is age and group A is people who are younger than 20 years and group B is those who are older). How can I calculate the treatment effect on each group?

abdollahpouri avatar Jul 02 '22 17:07 abdollahpouri

As mentioned on #526, any correlation you see between features and predicted treatment effects is not necessarily causal.

Regarding your second question, if you are simply interested in calculating treatment effects for different subgroups, you should use a multiple regression model of the form y ~ a + bW + bF1 + bWF1 where the last term is an interaction between the treatment and the binary variable F1. The regression coefficient bWF1 will tell you how much the treatment effect differs between the two categories of F1.

t-tte avatar Jul 02 '22 20:07 t-tte

How about this? Assume we have two groups: users whose F1t< theta and those where F1>theta. Can I just filter my data into two sets Data1 and Data2 containing users only in each group and fit two separate models on each set and see the treatment effects on the users on each set? This way I can see the difference. I am a newbie to causal inference so please pardon me if this looks too naive.

abdollahpouri avatar Jul 02 '22 20:07 abdollahpouri

Yes, you can do that, although there is no benefit compared to the multiple regression approach mentioned above. If you decide to split the data yourself, you could use something like a permutation test to understand whether any differences in treatment effects are statistically significant.

t-tte avatar Aug 30 '22 00:08 t-tte