EconML icon indicating copy to clipboard operation
EconML copied to clipboard

How to use EconML when we don't know how to distinguish between X and W?

Open itewqq opened this issue 2 years ago • 6 comments

Hi there,

I have noted the following discussion here: https://github.com/microsoft/EconML/issues/589

What I'm curious about is, in a real scenario, for some FEATURES, I can't know which ones have an effect on T and which ones don't, so how how to use EconML under such situation?

Also, if I set all the FEATURES to X, will it have any bad effect on the final result?

Thanks!

itewqq avatar Jul 20 '22 08:07 itewqq

Regarding bad effect on final result, if you cant distinguish W and X, which I assume are the sufficient adjustment set and set of all features, then you going to have an incorrect causal estimation. For my understanding how you go from all set of features X -> to specific set of features W is by domain knowledge understanding.

My question would be why do we even specify X if we are specifying W.

salman-moh avatar Jul 25 '22 06:07 salman-moh

W and X should both be things that affect T and Y, the difference is that things in X are allowed to also affect the strength of the relationship between T and Y. In general, if you don't know whether something might affect that relationship or not, it's probably safer to default to including it in X rather than in W. However, the downside of this is that this makes the treatment effect estimation problem harder, so you should expect to get wider confidence intervals on the effect estimate even if it turns out that that particular feature does not affect the strength of the relationship.

kbattocchi avatar Jul 26 '22 00:07 kbattocchi

so then are X and W together all just confounder variables?

salman-moh avatar Jul 26 '22 11:07 salman-moh

Yes, exactly. And for techniques like Double ML, we concatenate X and W together for fitting our first stage models, but then we only featurize and interact X with the T residuals when fitting the second stage model.

kbattocchi avatar Jul 26 '22 12:07 kbattocchi

What if a feature do not direct affect Y and T, but it affect the strength of the relationship between T and Y. Should I add it to X?

justforsoy avatar May 18 '23 09:05 justforsoy

@justforsoy If it affects the relationship between T and Y, then I think it must inherently affect Y as a result of this relationship (i.e. most of our estimators assume the structural model Y=theta(X)*T+..., so Y does vary based on X even if X does not appear in the remainder of the expression).

But to directly answer your question, yes, a variable should go into X if it affects the strength of the relationship.

kbattocchi avatar May 18 '23 17:05 kbattocchi