EconML
EconML copied to clipboard
True TE estimates in customer segmentation example
Hi, I have a couple of questions for Case Study - Customer Segmentation at An Online Media Company
- In the true te estimate part, why was it not just gamma_fn(X) but instead price*gamma_fn(X)/demand? Because I see the formula in DGP is Y = gamma(X).T + beta(X,W), then I presume gamma(X) is the CATE.
- How should we actually interpret the CATE(X) when the Y and T were log transformed? As I understand, The CATE(X) interpretation when using raw Y and T is for example CATE 0.3 means that on certain X value, the Y will change by 0.3 with the change in treatment. I'm not sure with the log transformation being used here.
I appreciate your responses. Thanks!
have you figured out? I have the same question
To make the example realistic, we use a mis-specified model (where the functional form of what we're estimating does not exactly match the data-generating process). We take the log transform of price and demand so that the treatment effect that we calculate is the price-elasticity of demand, which is basically the percentage change in demand that results from a percentage change in price and which is frequently used in pricing applications.
However, because this model is mis-specified, there is no simple linear relationship between log(Y) and log(T): log(Y)=log(gamma(X).T+beta(X,W)), which can't be rewritten as log(T)*theta(X)+f(X,W). However, for small changes around the typical price, hopefully a linear approximation will be reasonable. In general, ∂log(Y)/∂log(T)=∂Y/∂T * (T/Y), so the true treatment effect in the transformed system will be gamma(X)*T/Y, which is our ground truth that we hope to recover.
Does that help?
And just to be even more explicit - the true treatment effect really is gamma(X)*T/Y (this is not an approximation); the issue with our mis-specified model is that we are assuming that the true treatment effect is a function of only X, which is not the case here - because of how T and Y enter this expression, the true treatment effect depends on not only X but also T and W, and so for a given value of X we can average this expression over our data (or compute other quantities related to the distribution, such as 5th and 95th quantiles), but we won't have only a single CATE for any given X which we would if our model were correctly specified.