doubleml-for-py icon indicating copy to clipboard operation
doubleml-for-py copied to clipboard

Store estimated models for nuisance parameters

Open MalteKurz opened this issue 2 years ago • 0 comments

Description

This PR implements the often requested feature to store the estimated models for nuisance parameters. To use it, call the method fit() with option store_models=True. Example:

import numpy as np
import doubleml as dml
from doubleml.datasets import make_plr_CCDDHNR2018
from sklearn.ensemble import RandomForestRegressor
from sklearn.base import clone
np.random.seed(3141)
learner = RandomForestRegressor(n_estimators=100, max_features=20, max_depth=5, min_samples_leaf=2)
ml_g = learner
ml_m = learner
obj_dml_data = make_plr_CCDDHNR2018(alpha=0.5, n_obs=500, dim_x=20)
dml_plr_obj = dml.DoubleMLPLR(obj_dml_data, ml_g, ml_m)
dml_plr_obj.fit(store_models=True)

The estimated models can then be found in the attribute dml_plr_obj.models:

dml_plr_obj.models
{'ml_l': {'d': [[RandomForestRegressor(), RandomForestRegressor(), RandomForestRegressor(), RandomForestRegressor(), RandomForestRegressor()]]}, 'ml_m': {'d': [[RandomForestRegressor(), RandomForestRegressor(), RandomForestRegressor(), RandomForestRegressor(), RandomForestRegressor()]]}}

Note that the number of fitted models depends on the settings and the considered model. The outer dictionary contains one entry for each nuisance part (here ml_l and ml_m). For each nuisance part there is dictionary containing an entry for each treatment variable (here only 'd'). The next inner part is a list of length n_rep (repeated cross-fitting) and then a list of length n_folds (number of folds per repeated cross fit).

PR Checklist

  • [x] The title of the pull request summarizes the changes made.
  • [x] The PR contains a detailed description of all changes and additions.
  • [x] The code passes all (unit) tests.
  • [x] Enhancements or new feature are equipped with unit tests.
  • [x] The changes adhere to the PEP8 standards.

MalteKurz avatar Sep 16 '22 13:09 MalteKurz