python-causality-handbook
python-causality-handbook copied to clipboard
Incorrect Code in Chapter 20 (and theoretical nitpicking)
First of all, thank you for making this very accessible book!
In the section about continuous treatment in chapter 20, you defined
Y^*_i := (Y_i- \bar{Y})\dfrac{(T_i - M(T_i))}{(T_i - M(T_i))^2}
to be the pseudo-outcome[^1] and then you threw away the denominator since you are interested in comparing treatment effects, not their absolute values. But doing so does not preserve order[^2]. Instead why don't we just simplify it to be
Y^*_i = \dfrac{Y_i- \bar{Y}}{T_i - M(T_i)}?
Now onto the actual issue: the code block that came after
Y^*_i = (Y_i- \bar{Y})(T_i - M(T_i))
is
y_star_cont = (train["price"] - train["price"].mean()
*train["sales"] - train["sales"].mean())
but this is missing some parentheses, so it actually computes
Y^*_i \overset{???}{=} Y_i- (\bar{Y} \times T_i) - M(T_i).
[^1]: The denominator I assume is an estimate of the conditional variance Var(T|X), but for most regression methods this residual is an underestimate. [^2]: In the end we will average those values up to estimate the CATE. But unlike the randomized treatment case where every term is scaled by σ² and can be un-scaled without changing order, here each term has a different factor.