numpyro icon indicating copy to clipboard operation
numpyro copied to clipboard

Use biased autocorrelation estimate by default

Open jonny-so opened this issue 1 year ago • 1 comments

I was noticing some very erratic and unexpected behaviour from the effective_sample_size diagnostic, which was due to some extreme values in the far right tail of the autocorrelation function. This can be reproduced pretty easily by plotting the autocorrelation of a sequence of 1000 IID Gaussians.

This behaviour seems to be due to the following line https://github.com/pyro-ppl/numpyro/blob/2f1bccdba2fc7b0a6ec235ca1bd5ce2417a0635c/numpyro/diagnostics.py#L130 which I believe is there to make the estimate unbiased. Stan however uses the biased estimate of Geyer (1992) (see https://github.com/stan-dev/stan/blob/634034deb3abd6314d980c1aab083f64269f4019/src/stan/analyze/mcmc/autocovariance.hpp#L60), presumably to stop the issue of the variance in the right tail exploding.

I have local changes which (optionally) use the biased estimate. I think this should be the default, partly to be consistent with Stan, but also because it seems too erratic to be useful with the current behaviour. It seems to be an intentional departure from the Stan implementation however, so I thought best to open an issue here to discuss.

jonny-so avatar Apr 19 '24 12:04 jonny-so

Wow, this is subtle. It would be great if you could contribute a PR, @jonny-so! Could you also add some tests to illustrate that the biased estimate behaves better in some cases? Thanks!

fehiepsi avatar Apr 29 '24 21:04 fehiepsi

I created a PR. Also, just to correct the record having read some more -- the biased estimator was discussed, but not proposed by Geyer (1992). It is also discussed in depth by Priestley (1981, Section 5.3), who cites Parzen (1961) and Schaerf (1964). I was also wrong to say that the variance explodes in the tail, it just remains O(1) as the chain length grows, leading to "wild" tails (Priestly, 1981).

jonny-so avatar Oct 05 '24 20:10 jonny-so