skgrf
skgrf copied to clipboard
How should tuning be implemented?
GRF includes tuning facilities for many of the estimators. In particular, the following estimators have tuning parameter options:
- Regression forest
- Causal forest
- Instrumental forest
- Local linear forest
- Boosted forest
- Causal survival forest
In addition, some forests use tuning implicitly, and/or pass tuning parameters down into internal forests.
- Causal forest performs tuning but also passes tune params down into the orthogonalization forests (regression and boosted) in which tuning is performed separately.
- Instrumental forest performs tuning but also passes tune params down into the orthogonalization regression forest in which tuning is performed separately
- Boosted forest uses tune params on the initial forest, but not the boosted ones
Scikit-learn also provides facilities for hyperparameter tuning under the model_selection
module. This begs the question: When and where in skgrf should tuning be implemented, if at all?
-
Make skgrf a true port of R-grf. This means implementing tuning exactly as it exists in the R lib, ignoring sklearn model selection, and hardcoding tuning in the same way.
-
Ignore R-grf's tuning entirely, allowing users to utilize the
model_selection
module. This means however, that the implementations for Causal, Instrumental, and Boosted forests would be different than what exists in R. -
Selectively implement R-grf's tuning, in order to maintain parity with R-grf's implicit tuning. This is the current implementation.
-
Refactor some of the estimators to allow more fine-grained control of tuning separate components, removing tuning from skgrf and allowing users to tune with
model_selection
objects.