grf icon indicating copy to clipboard operation
grf copied to clipboard

Option to store nuisance parameter objects

Open MCKnaus opened this issue 1 year ago • 2 comments
trafficstars

This is no issue, but a suggestion for a feature.

Currently the objects include the estimated nuisance parameters Y.hat, W.hat ... However, the underlying regression_forest objects are gone for good (at least as far as I can see).

For a new paper that extracts the outcome weights of the point estimates (https://arxiv.org/abs/2411.11559), I need to know the get_forest_weights of Y.hat. My workaround is currently to estimate Y.hat externally such that I have its object (see this notebook for an illustration why and how).

I perfectly understand that saving nuisance parameter (NP) objects is not attractive to save memory. However, a store_nuisance_parameters option would be quite useful for my purposes. In the best case with control about which of the mutiple NP objects should be saved.

Just FYI that there would be a consumer of such a feature.

Thank you in any case for your great work!

MCKnaus avatar Nov 20 '24 09:11 MCKnaus

Hi @MCKnaus, thank you very much for the suggestion (and also for the interesting reference). For cases with custom outcome/propensity models, the intended design was to construct those models outside. If you need to carry those objects with you in downstream tasks, then maybe a very simple option could be to just store them in the returned causal forest object? Like my.forest = causal_forest(X,Y,W,Y.hat=...); my.forest$Y.model = Y.model, then whenever you need access to that causal forest's Y.model you'd just access my.forest$Y.model.

erikcs avatar Nov 24 '24 08:11 erikcs

Thank you for the suggestion. I will add this as option when revising the OutcomeWeights package. Then, users do not need to explicitly store the smoother matrix. Also it would be immediately compatible if you decide to include this feature and use Y.model as label. In the best case, it eventually boils down to running

my.cf = causal_forest(X,Y,W, store_nuisance_parameter ="Y")
omega = get_outcome_weights(my.cf)

where get_outcome_weights() calls my.cf$Y.model internally.

But feel free to ignore.

MCKnaus avatar Nov 25 '24 10:11 MCKnaus