scikit-tree
scikit-tree copied to clipboard
How to handle S@98 estimation when building an honest forest
trafficstars
- Rejection bootstrap sampling: if a bootstrap sample does not have enough control samples (e.g. 50 for S@98) to estimate S@98 properly, then reject this bootstrap sampled indices and repeat
- Upweight the sample weights based on class: this is the strategy sklearn currently has
- Stratify bootstrap sample:
My inclination is just do 1
Our estimate is broken, as evidenced by the linear simulation and more trees makes the accuracy worse. So, let's fix that. Is that a new issue, or this issue?