EconML
EconML copied to clipboard
Difference between X and W
Hi! Can you explain what the difference between confounders and controls in Double ML? Shouldn't we consider all the features as X because confounders (W) influence on Y and therefore should be in the final model?
Both X and W are used to predict both the outcome (Y) and the treatment (T). The difference is that we assume that only the features X have an effect on the strength of the relationship between Y and T. That is, we assume that the treatment effect Theta is a function of X but not W. You could certainly include all confounders in X instead of in W, but there are also cases where you expect the effect to be heterogeneous only with respect to some important subset. Hope that helps, but please feel free to follow up with any other questions.
Hi, @kbattocchi! I have a similar question on the API design / notation used. Reading the API Design section of the documentation:
Ware other observable co-variates that we believe are affecting the potential outcomeYand potentially also the treatmentT. (...) We will refer to variablesWas controls. (...) The variablesXcan also be thought of as control variables, but they are special in the sense that they are a subset of the controls with respect to which we want to measure treatment effect heterogeneity. We will refer to them as features.
To my understanding, "controls" in literature simply refer to the variables we add to the model, and graphical criteria can be used to discern whether they are good, neutral or bad for inference (A Crash course in Good and Bad Controls, Cinelli et al., 2021). From the quoted text, it is clear how confounders (aka good controls) fit in the EconML API: If they are important for effect heterogeneity they are part of X, otherwise they are part of W. But what about other variables that are not confounders for the causal relation under study, yet will still benefit the model by e.g. reducing the variance of estimates? Such an example would be a cause of the outcome that does not open backdoor paths. In small samples, the inclusion of such auxiliary variables might make or break statistical significance, and I am not sure how they fit into the API.
@ggiannarakis This is a good question; I believe the same logic applies to the heterogeneous treatment effect setting that we use, so you'd want to include variables that affect the outcome but not the treatment (because this removes outcome variation that isn't attributable to the treatment) and exclude variables that affect the treatment but not the outcome (because these act through the treatment, so removing their affect will attenuate the amount of information about the strength of the treatment that is available).