Statistics.jl icon indicating copy to clipboard operation
Statistics.jl copied to clipboard

Pairwise Summation/Reduction for `var`

Open ParadaCarleton opened this issue 3 years ago • 2 comments

At the moment, var does a naive sum by adding up the squared deviations from the mean. However, when var is called on a collection, we can speed it up and also reduce the floating-point error significantly by using pairwise summation with a recursive algorithm -- roughly:

mean(var(first_half), var(second_half)) + var([mean(first_half), mean(second_half)])

(Note that this would require implementing fused statistics like mean_and_var from StatsBase, or else we would have to do more than one pass -- one for mean and one for var.)

ParadaCarleton avatar Jan 30 '22 17:01 ParadaCarleton

Interesting. Do you have references about this? One tricky part would be to compute the variance of means without storing them in a intermediate array, or the performance benefit would probably be lost.

nalimilan avatar Feb 06 '22 13:02 nalimilan