StatsBase.jl
StatsBase.jl copied to clipboard
Allow and Provide Different Estimators for Autocovariance
As mentioned in #273, there are many possible estimators or calculations for the autocovariance function. For example, the autocovariance can be estimated using:
- A denominator of
n(orn-1with Bessel's correction) for all lags and the naive algorithm - A denominator of
n-k(orn-k-1with Bessel's correction) for all lagskand the naive algorithm - An approximate algorithm based on the FFT, which can be calculated in
O(n*log(n))time. (I will defer to @devmotion on this, as he implemented this in MCMCChains and knows more about it than me.) - Some other estimator -- for example, Geyer's initial positive sequence estimator finds the smallest lag
kwhereγ[k] + γ[k+1] < 0, then setsγ[k]=0for all largerkvalues. (The idea being that negative autocovariance is rare, so negative values reflect noise; observing a negative value implies the noise is greater than the signal.)
I think the complexity of autocovariance estimation warrants allowing different estimators for autocovariance (like we do for the covariance) and also providing at least a couple of these.
It would indeed make sense to support some simple variants like we do for covariance. The more complex variants could be supported via custom AbstractCovarianceEstimator types provided by other package (again like for covariance).
In MCMCChains/MCMCDiagnosticTools, we use a bit special setup, due to our application and for performance reasons, so I'm not completely sure how generally useful it is.
The implementation there, and some of the algorithms, are based on Stan and https://arxiv.org/pdf/1903.08008.pdf. The We only use the initial positive sequence estimator, which also implies that generally we only need a few lags. Hence, in contrast to what is reported in literature and done in Stan, the FFT method usually is not the most performant alternative. This is also consistent with the observation in MCMCDiagnostics, and one additional reason for why it's not the default method (another one is that it requires loading additional packages). As suggested by Geyer we also don't use denominators of n - k (or n - k - 1) in the default and the FFT method, which should basically give the same results. I don't remember the normalization of the variogram estimator from BDA right now, but I'm certain we didn't add any customizations there.
In MCMCChains/MCMCDiagnosticTools, we use a bit special setup, due to our application and for performance reasons, so I'm not completely sure how generally useful it is.
The implementation there, and some of the algorithms, are based on Stan and https://arxiv.org/pdf/1903.08008.pdf. The We only use the initial positive sequence estimator, which also implies that generally we only need a few lags. Hence, in contrast to what is reported in literature and done in Stan, the FFT method usually is not the most performant alternative. This is also consistent with the observation in MCMCDiagnostics, and one additional reason for why it's not the default method (another one is that it requires loading additional packages). As suggested by Geyer we also don't use denominators of
n - k(orn - k - 1) in the default and the FFT method, which should basically give the same results. I don't remember the normalization of the variogram estimator from BDA right now, but I'm certain we didn't add any customizations there.
The variogram estimator always uses a denominator of n-h, which makes it much less downward-biased than the sample covariance matrix with denominator of n (what we use now).
As said, in the literature the use of n (and not n - k) is recommended for MCMC analysis. Surely not everyone agrees - but the nice thing is that you can just choose whatever you prefer in MCMCChains and MCMCDiagnosticTools 🙂
As said, in the literature the use of
n(and notn - k) is recommended for MCMC analysis. Surely not everyone agrees - but the nice thing is that you can just choose whatever you prefer in MCMCChains and MCMCDiagnosticTools slightly_smiling_face
For the sample autocovariance, yes, n is better than n-k; the variogram estimator combined with the Geyer stopping rule is different though, because the differencing in the variogram yields more stable estimates. From BDA3:

Not sure what you want to say unfortunately. We implement exactly the version from BDA.
Not sure what you want to say unfortunately. We implement exactly the version from BDA.
I'm saying that the version from BDA uses n-k instead of n. Actually, this might explain why (when testing ParetoSmooth) I found it gives significantly different ESS estimates than Stan (as in, noticeably different, not practically different). I'll make an issue.
I'm saying that the version from BDA uses
n-kinstead ofn.
Sure, but that's what's implemented in MCMCDiagnosticTools. I don't understand where the problem is.
Stan uses the FFT approach, so any other comparison will be flawed.