EconML Reusing first-stage models for multiple `econml.dml.NonParamDML` runs

Hi there,

I'm trying to apply DML to a case where there's only one treatment intervention, but multiple outcome metrics. To save on compute time, I'd like to train model_t once and reuse it for all the DML runs, training one model_y per run.

Is this functionality supported?

Jun 29 '22 20:06 nmriabov

I can't think of a clean way to do this exactly as you're suggesting, but I think that one alternative might be to pass all of the outcome metrics as a single multi-column array Y, and to specify a corresponding composite model for model_y that just does whatever estimation you want to do for each column independently. Then this should be the same as computing all of your estimates at once using the same model_t without retraining it.

Jul 08 '22 04:07 kbattocchi

You could use a wrapper like below. You'd instantiate that new class ahead of time, and as long as you pass that same instance as propensity model to all the estimators, it should only be fit the first time.

class FitOnceWrapper(YourPropensityModelClass):
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.is_fitted = False
        
    def fit(self, *args, **kwargs):
        if not self.is_fitted:
            self.is_fitted = True
            super().fit(*args, **kwargs)
        else:
            pass

Jul 08 '22 09:07 EgorKraevTransferwise

@EgorKraevTransferwise , am I mis-understanding your intended approach? The econml.dml.NonParamDML method is uses the sklearn.base.clone method on the instance, meaning it will use an un-fitted version of the estimator. Thus each call to DML will re-train model_t.

Edit: I'm interested in re-using model_t and model_y elements (e.g., comparing LinearDML and CausalForestDML without having to re-compute the nuisance functions). The approach that I went with was to write a second wrapper:

class UncloneableEstimator():
    def __init__(self, estimator):
        self.fit = estimator.fit
        self.predict = estimator.predict

and then pass instances of estimated models as like: model_y = UncloneableEstimator(my_model_y) for my_model_y an instance of FitOnceWrapper (with the something similar being done for model_t... my treatments are continuous, but I'm assuming dichotomous treatments require the addition of self.predict_proba = estimator.predict_proba to the initialization of an UncloneableEstimator instance). I'm not sure if this is a bad practice, but it seems to work for my needs. If would be nice if EconML methods took some sort of "pre-compute=True" flag for cases where users provide instances of estimators rather than classes of estimators, but workarounds seem easy enough that it's likely not a high-priority addition.

Jul 29 '22 03:07 jamesnordlund

EconML EconML copied to clipboard

Reusing first-stage models for multiple `econml.dml.NonParamDML` runs

EconML
EconML copied to clipboard