lolo icon indicating copy to clipboard operation
lolo copied to clipboard

Unbias standard deviation estimator

Open gregor-robinson opened this issue 4 years ago • 2 comments

getStdDevMean currently uses ~a biased variance estimator~ the square root of the sample variance. This should be unbiased by replacing the denominator with ~treePredictions.length - 1~ treePredictions.length - 1.5 or similar superlinear bias correction. This should be done with care to avoid introducing a bias to the jackknife code which couples to the same treeVariance (probably best to just rescale in getStdDevMean, which already takes a sqrt).

gregor-robinson avatar Mar 12 '20 19:03 gregor-robinson

@mrupp-citrine made the good point in reviewing #217 that we should care more about debiasing the standard deviation estimate. So, although #216 adds a Bessel correction, we should at least use an N-3/2 correction so that the standard deviation is unbiased to second order.

gregor-robinson avatar Mar 25 '20 16:03 gregor-robinson

@mrupp-citrine suggests the following, which is within scope of this issue:

The above matters only for small ensembles. Still, documenting (in source comments) the answers to these questions might help later to better remember why these decisions (e.g., to use Bessel correction) were made.

gregor-robinson avatar Mar 25 '20 16:03 gregor-robinson