lolo
lolo copied to clipboard
Unbias standard deviation estimator
getStdDevMean currently uses ~a biased variance estimator~ the square root of the sample variance. This should be unbiased by replacing the denominator with ~treePredictions.length - 1
~ treePredictions.length - 1.5
or similar superlinear bias correction. This should be done with care to avoid introducing a bias to the jackknife code which couples to the same treeVariance (probably best to just rescale in getStdDevMean, which already takes a sqrt).
@mrupp-citrine made the good point in reviewing #217 that we should care more about debiasing the standard deviation estimate. So, although #216 adds a Bessel correction, we should at least use an N-3/2 correction so that the standard deviation is unbiased to second order.
@mrupp-citrine suggests the following, which is within scope of this issue:
The above matters only for small ensembles. Still, documenting (in source comments) the answers to these questions might help later to better remember why these decisions (e.g., to use Bessel correction) were made.