EconML
EconML copied to clipboard
Wrong ATE estimation result, expected positive ATE got negative ATE
Hi,
I am fitting a DML model to my data, and I know the ATE of my treatment is positive, but the model gives me the negative result. I am wondering why this happened and how can I explain this? Is there any way to fix this wrong estimation?
It's hard to say based on the information you provided, but here are a few things to keep in mind:
- What do the confidence intervals look like? Depending on your data, it's possible that your estimate is just very imprecise (i.e. the confidence intervals are wide), such that the point estimate has the wrong sign but the "true" value is still within the confidence intevals.
- The quality of the estimate depends on the quality of the first stage models, particularly the treatment model. Have you chosen appropriate models given the structure of your data?
- DML models make several assumptions of the data that you have (e.g. that the treatment effect is linear in the treatment and that there are no unmeasured confounders). If these assumptions are not satisfied in your setting then you shouldn't expect to get accurate results.
If you have a concrete example you can share, then it might be possible to provide more specific guidance.
Our goal is to find what make our outcome Y decrease, and Y should always be decreased given the treatment T. I am sorry that I cannot share the data, but the case is my data should be that if I give a treatment then my outcome will decrease. The Y and T are negative correlated. But the fact is that I also got some wrong data in which when T is increasing the Y decrease a bit. I used Gradient Boosting for first stage and second stage model and LinearDML, below is my code.
est1 = LinearDML(
model_y=GradientBoostingRegressor(),
model_t=GradientBoostingRegressor(),
cv=5)
I find one of my treatment t ATE is negetive, and -5.658953873151606e-05 and the CI is (-6.571635269071133e-05, -4.746272477232079e-05). I don't know if the data would make the estimate wrong and how can I explain this result if it is actually the data issue?