scikit-tree icon indicating copy to clipboard operation
scikit-tree copied to clipboard

How to handle S@98 estimation when building an honest forest

Open adam2392 opened this issue 1 year ago • 2 comments
trafficstars

  1. Rejection bootstrap sampling: if a bootstrap sample does not have enough control samples (e.g. 50 for S@98) to estimate S@98 properly, then reject this bootstrap sampled indices and repeat
  2. Upweight the sample weights based on class: this is the strategy sklearn currently has
  3. Stratify bootstrap sample:

adam2392 avatar Feb 20 '24 21:02 adam2392

My inclination is just do 1

adam2392 avatar Feb 20 '24 21:02 adam2392

Our estimate is broken, as evidenced by the linear simulation and more trees makes the accuracy worse. So, let's fix that. Is that a new issue, or this issue?

jovo avatar Oct 07 '24 15:10 jovo