EconML icon indicating copy to clipboard operation
EconML copied to clipboard

NonParamDMLIV sometimes ZeroDivisionError

Open fverac opened this issue 1 year ago • 3 comments

Seems that one can run into ZeroDivisionErrors when using NonParamDMLIV. Reproduction code below. Note, the error is not consistent. You may have to run it multiple times before getting the ZeroDivisionError. 

From briefly looking into it, it seems the T residuals are all 0's when the error occurs.

econml version 0.15.0b1. Haven't tried other versions.

Let me know if I'm missing something!

from econml.iv.dml import NonParamDMLIV
import numpy as np
from sklearn.linear_model import LinearRegression

n = 100
d_x = 3

Y = np.random.normal(size=(n,))
T = np.random.normal(size=(n,))
X = np.random.normal(size=(n, d_x))
Z = np.random.normal(size=(n,))

est = NonParamDMLIV(discrete_instrument=False, discrete_treatment=False, model_final=LinearRegression())

est.fit(Y, T, Z=Z, X=X)

fverac avatar Jan 02 '24 21:01 fverac

This is an interesting failure mode - if the estimates for E[T|Z,X,W] are always identical to E[T|X,W] then since the final model weights the rows by the estimated variance (E[T|Z,X,W]-E[T|X,W])^2, all the weights are zero which leads to this problem.

In this particular case, Lasso is regularizing all weights to 0 so the estimators always (correctly) predict E[T] = 0 regardless of whether we condition on Z,X,W or just on X,W.

Hopefully with real world data this is less likely to occur, but we could at least throw a more meaningful error message if we do run into this scenario. But I think it is a real error condition in that we depend on the instrument affecting treatment for identification, so I don't think ignoring it and producing an estimate (say, by using all 1s for the weights if they turn out to all be 0s) would be appropriate.

kbattocchi avatar Jan 03 '24 17:01 kbattocchi