doubleml-for-py
doubleml-for-py copied to clipboard
Store estimated models for nuisance parameters
Description
This PR implements the often requested feature to store the estimated models for nuisance parameters. To use it, call the method fit()
with option store_models=True
. Example:
import numpy as np
import doubleml as dml
from doubleml.datasets import make_plr_CCDDHNR2018
from sklearn.ensemble import RandomForestRegressor
from sklearn.base import clone
np.random.seed(3141)
learner = RandomForestRegressor(n_estimators=100, max_features=20, max_depth=5, min_samples_leaf=2)
ml_g = learner
ml_m = learner
obj_dml_data = make_plr_CCDDHNR2018(alpha=0.5, n_obs=500, dim_x=20)
dml_plr_obj = dml.DoubleMLPLR(obj_dml_data, ml_g, ml_m)
dml_plr_obj.fit(store_models=True)
The estimated models can then be found in the attribute dml_plr_obj.models
:
dml_plr_obj.models
{'ml_l': {'d': [[RandomForestRegressor(), RandomForestRegressor(), RandomForestRegressor(), RandomForestRegressor(), RandomForestRegressor()]]}, 'ml_m': {'d': [[RandomForestRegressor(), RandomForestRegressor(), RandomForestRegressor(), RandomForestRegressor(), RandomForestRegressor()]]}}
Note that the number of fitted models depends on the settings and the considered model. The outer dictionary contains one entry for each nuisance part (here ml_l
and ml_m
). For each nuisance part there is dictionary containing an entry for each treatment variable (here only 'd'
). The next inner part is a list of length n_rep
(repeated cross-fitting) and then a list of length n_folds
(number of folds per repeated cross fit).
PR Checklist
- [x] The title of the pull request summarizes the changes made.
- [x] The PR contains a detailed description of all changes and additions.
- [x] The code passes all (unit) tests.
- [x] Enhancements or new feature are equipped with unit tests.
- [x] The changes adhere to the PEP8 standards.