python-causality-handbook icon indicating copy to clipboard operation
python-causality-handbook copied to clipboard

Incorrect Code in Chapter 20 (and theoretical nitpicking)

Open aliquod opened this issue 1 year ago • 1 comments

First of all, thank you for making this very accessible book!

In the section about continuous treatment in chapter 20, you defined

Y^*_i := (Y_i- \bar{Y})\dfrac{(T_i - M(T_i))}{(T_i - M(T_i))^2}

to be the pseudo-outcome[^1] and then you threw away the denominator since you are interested in comparing treatment effects, not their absolute values. But doing so does not preserve order[^2]. Instead why don't we just simplify it to be

Y^*_i = \dfrac{Y_i- \bar{Y}}{T_i - M(T_i)}?

Now onto the actual issue: the code block that came after

Y^*_i = (Y_i- \bar{Y})(T_i - M(T_i))

is

y_star_cont = (train["price"] - train["price"].mean()
               *train["sales"] - train["sales"].mean())

but this is missing some parentheses, so it actually computes

Y^*_i \overset{???}{=} Y_i- (\bar{Y} \times T_i) - M(T_i).

[^1]: The denominator I assume is an estimate of the conditional variance Var(T|X), but for most regression methods this residual is an underestimate. [^2]: In the end we will average those values up to estimate the CATE. But unlike the randomized treatment case where every term is scaled by σ² and can be un-scaled without changing order, here each term has a different factor.

aliquod avatar Aug 02 '24 15:08 aliquod