beast2 icon indicating copy to clipboard operation
beast2 copied to clipboard

LogAnalyser estimates of 95%HPDs

Open rbouckaert opened this issue 3 years ago • 0 comments

95% HPDs are constructed as follows: given N samples, x_1…x_N sorted by value, put a moving window of size n=(N-1)*95/100 over the sorted sequence and report the x_i and x_{i+n} that have smallest difference (x_{i+n}-x_i).

The problem is that k is not necessarily covering 95% of the samples. If N is 100, n=94, but if N is 101, n is still 94, so will cover only (n+1)/N=95/101 or approx 94% of the cases. The larger N, the smaller the potential error.

I observed some bias when sampling from a lognormal(M=1,S=1.25) (default prior for kappa in HKY model) and noticed the mean over 100 such samples that average 95%HPD interval sizes increased with increasing number of independent samples. True values of the 95%HPD are 0.0183 - 21.388, estimated values are

N lower upper 31 0.2515 16.5555 101 0.1409 20.3212 201 0.106 20.8855 631 0.0665 21.1164 1001 0.0547 21.4055

When I find the best x_i and x_{i+k} but extend the average of the interval (take (x_{i-1}x_i)/2 for lower, and (x{i+k}+x_{i+k+1})/2 as upper) the bias seems to go away, at least to a large extend for the upper bound, and to much lesser extend for the lower bound:

N lower upper 31 0.2498 19.8718 101 0.1405 21.5143 201 0.1045 21.4444 631 0.0653 21.3454 1001 0.0532 21.524

Not sure whether this is the right solution for this, but just noticing there appears to be some bias in the 95% HPD intervals of LogAnalyser in some cases, and this may have an impact on the way well calibrated simulation studies are done.

rbouckaert avatar Jan 11 '22 20:01 rbouckaert