scikit-tree icon indicating copy to clipboard operation
scikit-tree copied to clipboard

Should `HonestForest*` have `bootstrap=False` or `bootstrap=True` as default

Open adam2392 opened this issue 2 years ago • 2 comments
trafficstars

It is unclear what the default should be because in scikit-learn, bootstrap=True on Forests are the default.

cc: @rflperry @sampan501 mentioned that your original implementation had boostrap=False as the default. To my knowledge, there is no reason to default in HonestForests, so I'm wondering if we should stick w/ scikit-learn defaults?

adam2392 avatar Oct 16 '23 18:10 adam2392

I believe this the initial HonestForest implementation took the defaults of the Generalized Random Forest package in R (GRF). Honest forests use a subsample to to learn trees due to the whole idea of "honesty". When bootstrap=True , I believe what we do (and GRF does) is bootstrap the structure learning subset of the data. In a regular forest, bootstrapping is useful as it helps to decorrelate the trees. It's not clear that this is needed on top of the normal sample splitting present in honest trees.

rflperry avatar Oct 16 '23 20:10 rflperry

Currently, we will set bootstrap=False due to just backwards incompatibility of the unit-tests when changing, but we can explore what will happen if we change it

adam2392 avatar May 06 '24 14:05 adam2392