What algorithms are being used for the mean and variance?
As far as I can tell from the documentation of Mean() and Variance(), the algorithms on which your implementations are based are not documented. As an academic user, it is problematic for me to use your implementation which claims to be an online algorithm without any references. I ask that you please add information/references about the algorithms used to the documentation.
The algorithm for the mean is trivial.
if:
M[n] = sum(x[1:n]) / n
then:
M[n+1] = (x[n+1] + sum(x[1:n])) / (n+1)
= (x[n+1] + n*M[n]) / (n+1)
= x[n+1] / (n+1) + n / (n+1) * M[n]
Let g = 1/(n+1) and then you have
M[n+1] = g * x[n+1] + (1-g) * M[n]
= M[n] + g * (x[n+1] - M[n])
= smooth(M[n], x[n+1], 1 / (n+1))
which is the formula being used at: https://github.com/joshday/OnlineStatsBase.jl/blob/master/src/stats.jl#L471-L473
with the smooth method found here:
https://github.com/joshday/OnlineStatsBase.jl/blob/master/src/OnlineStatsBase.jl#L184
For standard weights, you have weight(n) = 1/n, so altogether you get o.μ = smooth(o.μ, x, o.weight(o.n += 1)).
For the variance, it appears to be Welford's online algorithm. You can check the wikipedia page and follow the citations there.
A lot of the algorithms here are covered in my dissertation: https://repository.lib.ncsu.edu/server/api/core/bitstreams/1c760e12-e284-43e0-a6c4-0271cc84912b/content (PDF).
Thank you both!
@joshday as a possible improvement to the package docs, please cite your dissertation somewhere prominently in the documentation so users can figure out which algorithms are used if they need to.