Statistics.jl
Statistics.jl copied to clipboard
Pairwise Summation/Reduction for `var`
At the moment, var does a naive sum by adding up the squared deviations from the mean. However, when var is called on a collection, we can speed it up and also reduce the floating-point error significantly by using pairwise summation with a recursive algorithm -- roughly:
mean(var(first_half), var(second_half)) + var([mean(first_half), mean(second_half)])
(Note that this would require implementing fused statistics like mean_and_var from StatsBase, or else we would have to do more than one pass -- one for mean and one for var.)
Interesting. Do you have references about this? One tricky part would be to compute the variance of means without storing them in a intermediate array, or the performance benefit would probably be lost.