skgrf icon indicating copy to clipboard operation
skgrf copied to clipboard

How should tuning be implemented?

Open crflynn opened this issue 4 years ago • 0 comments

GRF includes tuning facilities for many of the estimators. In particular, the following estimators have tuning parameter options:

  • Regression forest
  • Causal forest
  • Instrumental forest
  • Local linear forest
  • Boosted forest
  • Causal survival forest

In addition, some forests use tuning implicitly, and/or pass tuning parameters down into internal forests.

  • Causal forest performs tuning but also passes tune params down into the orthogonalization forests (regression and boosted) in which tuning is performed separately.
  • Instrumental forest performs tuning but also passes tune params down into the orthogonalization regression forest in which tuning is performed separately
  • Boosted forest uses tune params on the initial forest, but not the boosted ones

Scikit-learn also provides facilities for hyperparameter tuning under the model_selection module. This begs the question: When and where in skgrf should tuning be implemented, if at all?

  1. Make skgrf a true port of R-grf. This means implementing tuning exactly as it exists in the R lib, ignoring sklearn model selection, and hardcoding tuning in the same way.

  2. Ignore R-grf's tuning entirely, allowing users to utilize the model_selection module. This means however, that the implementations for Causal, Instrumental, and Boosted forests would be different than what exists in R.

  3. Selectively implement R-grf's tuning, in order to maintain parity with R-grf's implicit tuning. This is the current implementation.

  4. Refactor some of the estimators to allow more fine-grained control of tuning separate components, removing tuning from skgrf and allowing users to tune with model_selection objects.

crflynn avatar Feb 21 '21 22:02 crflynn