EconML
EconML copied to clipboard
How to make predictions with models_y?
I have the following setup (no W variables):
est = LinearDML(model_y=LinearRegression(), model_t=LogisticRegression(max_iter=1000), discrete_treatment=True, random_state=55)
est.fit(Y=y, T=T, X=X, W=None)
I have 8505 observations, and 78 features in X. I am able to make predictions using models_t: est.models_t[0][0].predict(X) but I'm not able to do with with models_y. I get "ValueError: X has 78 features per sample; expecting 6162".
Why does model_y not expect 78 features? What is the correct way to format input to make predictions using models_y?
By default, LinearDML has its linear_first_stages attribute set to True, which for technical reasons will cause the Y model to be trained on [X,W] cross [1;phi(X)], where phi is the featurizer used (by default the identity). Since you have no W this will basically simplify to a permutation of [X,X cross X] (which has 78+78*78=6162 columns). This is necessary because if both of the first stage models are linear (e.g. Lasso) then we would not otherwise be guaranteed to recover the correct treatment effects if we don't incorporate the extra cross terms even if the first stage models are correctly specified.
However, in your case the first stage model is logistic anyway, so I'd recommend setting linear_first_stages to False, and then the Y model will be trained as you expected.
Thanks for the quick reply! Would it be wrong to use linear_first_stages = True in my case? Or is it just unnecessary from a theoretical standpoint? Is it okay to have linear_first_stages = True and include the cross terms in model_y?