EconML
EconML copied to clipboard
Use EconML for discrete treatment and diseases outcomes
Hi,
I just start using EconML for estimation of discrete treatment and diseases outcomes by DROrthoForest.
For example, I have treatment T = [0,1,0,1,1,1,0,0,1,0,...] and disease outcomes Y = [1,0,1,1,0,1,0,1,1,...].
However, after the model fitting, it offers me results by model.effect() like this
[-.001 , -.3, -0.025, ...]
So why would I get float results? Isn't result be either 0 or 1 or -1?
My understanding is that when model flips treatment (from 0 to 1, or from 1 to 0), it should predict if the patient will get disease or not (0 - no disease, 1 - disease).
Then the individual treatment effect is should be 0 - 0 = 0, 0-(1) = -1, 1 - 0=1, 1 - 1= 0, four situations.
Thank you!
The treatment effect is the estimated average effect on Y from moving from T=0 to T=1, given X. So even if every individual outcome takes an integer value, in general the treatment effect will not be constrained to also being integral, since it's an average.
Thank you @kbattocchi 👍 Here is how I understand your explanation and relevant papers. Could you please correct me if it is wrong? Let's say we have T treatment in [0,1], X in [0,1] as patient genders, and other cofounders/variables in W. I run following code to calculate conditional treatment effect (CATE) on X:
est = DROrthoForest(n_trees=100,
max_depth=5,
model_Y = WeightedLasso(alpha=0.01))
est.fit(Y, T, X=X, W=W)
hte = est.effect(X[:])
Then the computed results in hte for each individual is the average predicted value of corresponding leaves in all trees.
Is this right? If yes, then I guess the hte value for each individual of same gender should be approximately the same.
The result I get is the following:
0, 0.113418
1, 0.113417
2, 0.113422
3, 0.113421
4, 0.175136
5, 0.113424
6, 0.175135
7, 0.175137
8, 0.113426
9, 0.175133
10, 0.175134
They look same for two genders, but have slightly difference. A new problem is: (e.g., 0.113417 and 0.113422) is such difference caused by floating points calculation? In other words, they can be considered as the same treatment effect.
Thanks agian
Your understanding is basically right, but the OrthoForest logic is a bit unusual compared to other estimators because at fit time we create a bunch of trees but then at predict time (when we're trying to compute the effect) there is another model-fitting step in the estimation process where we're trying to compute weighted local effects - this second stage of fitting is what is causing you to get slightly different results even when the inputs are the same.
For now, one workaround would be to ensure that the models that are fit always calculate the same estimate given fixed input data; unfortunately this is not the case for the default models used by DROrthoLearner, but that can be addressed by explicitly specifying the random state of each model like so:
est = DROrthoForest(n_trees=100,
max_depth=5,
model_Y = WeightedLasso(alpha=0.01),
propensity_model=LogisticRegression(penalty='l1', solver='saga', multi_class='auto',
random_state=2))
(where I've just chosen an arbitrary values for the random state of the estimator - you could just use 0 or any other fixed value of your choice).
The model refitting at estimation time is a fundamental property of the Othogonal Random Forest algorithm, but we'll see if there is something that we can do at the library level to make it easier to get consistent results since this is a potentially confusing aspect of the output.