StatsBase.jl icon indicating copy to clipboard operation
StatsBase.jl copied to clipboard

Cumulant function is not numerically stable

Open mattcbro opened this issue 2 years ago • 4 comments

The recursion used for calculating cumulants from a sequence of data is not numerically stable. Apparently this is known in the literature and is discussed in this paper: https://www-2.rotman.utoronto.ca/~kan/papers/ncq.pdf

I offer a simple test showing that the cumulants of a gaussian vector, which should all be zero after the second one, diverge.

using StatsBase
using Plots

# create a long sampled vector from a normal distribution
N = 10000
v = randn(N)


# compute and plot the cumulants.  Only the second
# one should be nonzero
kix = 1:15
rkstats = cumulant(v, kix)
plot(kix, rkstats)

mattcbro avatar Oct 19 '23 00:10 mattcbro

Thank you so much for opening this issue! Would you be willing to open a PR implementing the more precise algorithms? (Is there any performance cost to doing so?)

ParadaCarleton avatar Oct 28 '23 02:10 ParadaCarleton

Scipy restricts the range to 1:4, perhaps to avoid the numerical instabilities?

https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.kstat.html

alecloudenback avatar Oct 28 '23 03:10 alecloudenback

Thank you so much for opening this issue! Would you be willing to open a PR implementing the more precise algorithms? (Is there any performance cost to doing so?)

I'll look into it. I haven't tried to implement the one in the paper. Since it requires a solution to an eigenvalue problem it may be slower, but correctness is of course more important.

mattcbro avatar Oct 30 '23 08:10 mattcbro

That would be great, thank you!

It might also be worth checking out what OnlineStats.jl does. IME it usually has cleaner/faster/more accurate implementations than we do.

ParadaCarleton avatar Oct 31 '23 03:10 ParadaCarleton