lolo icon indicating copy to clipboard operation
lolo copied to clipboard

Add `minDistinctLabels` to decision tree to prevent UQ collapse in Bagger

Open maxhutch opened this issue 5 years ago • 1 comments

If the training labels have repeats of label values, then it is increasingly possible that every tree in the ensemble makes the same prediction (even if the input values are different). This could be prevented by imposing a minimum number of distinct label values in the leaves of the decision trees. That would significantly increase the likelihood that different trees had different pairs of label values in the leaf that hits a prediction, and therefore make different predictions, and therefore has some predictive uncertainty.

cc: @bfolie

maxhutch avatar Dec 04 '19 21:12 maxhutch

An alternative: simply set a predicted uncertainty floor that depends on the variance of the training labels and the number of training rows.

maxhutch avatar Dec 05 '19 16:12 maxhutch