EconML
EconML copied to clipboard
[question] Can't replicate manual DML results in EconML
Hello, I am a novice in EconML Library. I am comparing the DML with single treatment variable results using EconML vs coding it manually and getting wildly different results, with manually coded DML OLS model resulting in much higher R2 score. Here's what I do:
- Log-transform treatment and effect variables
- Take zscore of all other input features (they are all numeric)
- Use DML with LightGBMRegressor for both model_y and model_t (EconML) with leave-one-out cross-validation (input size-1, also tried 2, 5, 10 etc.) Use lgbm.cv with input size-1 (manual). Both are used with default hyperparameters.
- Use StatsModelsLinearRegression for regression (EconML). Use ols from statsmodels.formula.api for regression (Manual)
- Also tried using ols from statsmodels.formula.api for regression on the residuals that I get from EconML DML estimate.
I don't expect exactly the same results, but I what I am getting is too different - R2=0.01 (EconML) R2=0.55 (manual).
I can't include the code because of data restrictions, but, if it will help, can try it out on sample data and see if this behaviour persists. Before I do that though, perhaps there is something obviously wrong with my process. Please let me know if so.
Without more information this seems like it would be very difficult to solve. Could you at least provide the code you're using to get the residuals from the estimator and run your own regression?
(Also, on an unrelated note, I'm curious about how you're doing leave-one-out cross validation - keep in mind that EconML is already internally splitting your data into folds for cross-fitting so you won't get the original number of rows to the first stage models anyway)