EconML
EconML copied to clipboard
Multiple Treatments with Econml
Hi, I greatly enjoy the EconML library. However, regarding multiple treatments, there is an issue I could not figure out. I would really appreciate your help. Here is the brief of my problem:
I have 2 binary columns (email_campaign,social_media_ad) with an X variable and binary outcome.I ran a combined treatment with CausalForestDML and ran separate CausalForestDML separately for each treatment. why I get different ate results? When running multiple treatments, when I set T0=0,T1=1 why the ate result is different than running a separate model with only treatment email_campaign? The combined treatment column is 0 when email_campaign and social_media_ad is zero, 1 when social_media_ad is 1 and social_media_ad is 0 , 2 when email_campaign is 1 and social_media_ad is 0, 3 when both are 1. A sample of the data is:
import pandas as pd import numpy as np from econml.dml import CausalForestDML from sklearn.ensemble import RandomForestClassifier, RandomForestRegressor np.random.seed(123)
Sample data (replace with your actual data)
data = pd.DataFrame({ 'Customer ID': range(1, 1001), 'Sales': np.random.randint(0, 1000, 1000), 'churn': np.random.randint(0, 2, 1000), 'Email Campaign': np.random.randint(0, 2, 1000), 'Social Media Ad': np.random.randint(0, 2, 1000) })
Create the combined treatment variable
data['Combined Treatment'] = data['Email Campaign'] * 2 + data['Social Media Ad'] data.columns=data.columns.str.lower().str.replace(' ','_')
Define features and target variable
X = data[['sales']] T = data['combined_treatment'] Y = data['churn']
Initialize the CausalForestDML model
est = CausalForestDML( model_t=RandomForestClassifier(random_state=123), model_y=RandomForestRegressor(random_state=123), discrete_treatment=True,random_state=123 )
Fit the model
model_est=est.fit(Y, T, X=X)
The ate result of each treatment: est.ate(X,T0=0,T1=1) --> -0.0016 Social_media_ad ( combined_treatment==1) est.ate(X,T0=0,T1=2) --> -0.033 The email_campaign (combined_treatment==2) est.ate(X,T0=1,T1=2) --> -0.032 est.ate(X,T0=0,T1=3) --> -0.051
Email: est_mail = CausalForestDML( model_t=RandomForestClassifier(random_state=123), model_y=RandomForestRegressor(random_state=123), discrete_treatment=True,random_state=123 )
est_mail.fit(Y, data["email_campaign"], X=X)
est_mail.ate(X) --> -0.019
In the above example, T0=0,T1=2 means the treatment of email_campaign. My question is why it yields different results with multiple treatments and separate treatments? How to utilize the multiple treatments approach in EconML? Social media ad: est_social_media_ad = CausalForestDML( model_t=RandomForestClassifier(random_state=123), model_y=RandomForestRegressor(random_state=123), discrete_treatment=True,random_state=123 )
est_social_media_ad .fit(Y, data["social_media_ad"], X=X)
est_social_media_ad .ate(X) -->0.010
In the above example, T0=0,T1=1 means the treatment of social_media_ad. The result from multiple treatment model is negative but in the single treatment model is positive. Why?
Note: 1- I receive even contrasting (negative vs positive) results when running on different datasets. 2- I receive inconsistent results even if two treatment variables are totally independent, meaning when each customer receives only one treatment.
Best
At least with the sample data in your example, the confidence intervals are pretty wide (e.g. (-0.37, 0.37) for est.ate_interval(X,T0=0,T1=1)) so the point estimates for each estimator are well within the confidence intervals of the other, so I wouldn't worry about it.
It's not surprising that the point estimates aren't exactly the same: we stratify on treatment when creating samples for cross-fitting, so the estimators aren't seeing exactly the same samples, and the treatment models will behave slightly differently since they're predicting different things (email vs. not email in one case, as opposed to distinguishing between all of None, Email, Social, Both in the other).
Hi, I guess this example will answer your question https://github.com/py-why/EconML/blob/main/notebooks/Double%20Machine%20Learning%20Examples.ipynb
Thank you very much! I have a few other questions, though. 1- For binary (discrete) treatment and binary outcome, the model_t and model_y both should be classifier? 2- The negative ate means the treatment is decreasing churn? And positive ate increasing the churn 3- And how to interpret the ate? As the probability of churn?
Hi, I guess this example will answer your question https://github.com/py-why/EconML/blob/main/notebooks/Double%20Machine%20Learning%20Examples.ipynb
Thanks!
Thank you very much! I have a few other questions, though. 1- For binary (discrete) treatment and binary outcome, the model_t and model_y both should be classifier? 2- The negative ate means the treatment is decreasing churn? And positive ate increasing the churn 3- And how to interpret the ate? As the probability of churn?
- Yes, pass
discrete_treatment=Trueanddiscrete_outcome=Trueand then use classifiers for both models. - Negative ATE would mean that on average the treatment decreases the likelihood of the 'high' outcome. If your outcome is churn, then yes, negative ATE would mean that it decreases churn.
- The ATE is the average change in the probability of the outcome if the treatment goes from 0 to 1.
Following the previous questions, I have Encoded four treatments into one column, combined_treatment. This multi-treatment column’s values range from 0 to 15. I am running a CausalForestDML with XGBClassifier, as shown below. However, some of the point estimates are bigger than 1 or lower than -1. I get similar results with all of the treatment interactions. If the output of this CausalForestDML model is the probability of the outcome (churn, binary), why I get point_estimates higher than 1 or lower than -1? Switching from XGBClassifier to other algorithms such as RandomForestClassifier lowers the numbers of point_estimates that are outside of (-1,1) range, but still have some.
model = CausalForestDML( model_t=XGBClassifier(), model_y= XGBClassifier (), discrete_treatment=True )
est_model=model.dowhy .fit(Y, combined_treatment, X=X,W=W)
output1=est_model.effect_inference(X_test,T0=0,T1=1) output2=est_model.effect_inference(X_test,T0=0,T1=2) output3=est_model.effect_inference(X_test,T0=0,T1=3) …… output15=est_model.effect_inference(X_test,T0=0,T1=15) All these outputs yield some point_estimates out of -1,1 range. If the results are probabilities of outcome, how to interpret or justify these results? If the results are not probabilities of outcome, how to interpret?
This is the output for T0=0, T1=12:
Really appreciate your input!
Setting discrete_treatment=False does not help either. CausalForestDML and LinearDML do not have discrete_outcome, so I cannot set discrete_outcome=True.
I tried the wrapper class here, https://github.com/py-why/EconML/issues/334#issuecomment-844152110, it doesnt change the results either.
I'm a bit confused by your last statement - both CausalForestDML and LinearDML do have discrete_outcome arguments to their initializers (and as a side note, if your treatment is discrete you might want to use the DRLearner subclasses instead of DML ones anyway, though this same issue can also happen there).
The basic issue that can cause this type of result is just a kind of extrapolation. Imagine a setting where there's a binary treatment and we've learned first stage models where P(treatment=1) = 0.4 and P(outcome=1) = 0.2 for set of characteristics (e.g. for some rare combination of Xs). Then imagine that when we're training our final model, we have only one data point with this set of Xs, and it has treatment=1, outcome=1. Then the "surprise" portion of the outcome is 1-0.2=0.8, and the "surprise" portion of the treatment is 1-0.4=0.6, so the resulting treatment effect we'd calculate for this one-element subset would be 0.8/0.6>1.
As your sample size increases, this problem should become more and more rare (assuming your first stage models get arbitrarily accurate) - as the distribution of observed (treatment, outcome) pairs approaches the true density, it becomes mathematically guaranteed that the computed effect will be in [-1,1].
Hi, I have a related question. What happens when you run an econml sparse linear dml model with multiple treatments where at least one of the treatments is nonzero for each of the samples?
Are model results still valid despite not having a „proper control“? E.g. does the model handle this by setting a relative reference level comparing outcome variable differences across different treatment combinations?
I am seeing similar estimates when running such a model vs removing one of the treatments to have an actual control. Just want to understand why the results are still making sense (and it’s not just coincidence). Thanks for you help!!