"thin" argument affects diagnostics
Describe the bug If I use the same seed (to ensure reproducibility) and do two model$fit() runs with identical arguments except that one has thin=4, I get fewer divergent transitions, fewer max_treedepth transitions, and worse EBFMI reported for the run that has thinning. (Probably an underlying Stan issue, not just cmdstanr)
To Reproduce
model <- cmdstanr::cmdstan_model(path_to_funnel_model)
fit1 <- model$sample(data=list(), seed=12345678, chains=1, iter_warmup=1000, iter_sampling=4000)
fit1$diagnostic_summary()
# ^^^ num_divergent = 4, num_max_treedepth = 2, ebfmi = 0.08203262
fit2 <- model$sample(data=list(), seed=12345678, chains=1, iter_warmup=1000, iter_sampling=4000, thin=4)
fit2$diagnostic_summary()
# ^^^ num_divergent = 0, num_max_treedepth = 1, ebfmi = 0.2993997
Expected behavior Thinning should only affect the number of draws saved, not the diagnostics. A divergent transition is an indication of a possible problem whether or not you save it.
Operating system Your operating system (e.g. mac os x 10.15, windows 10, etc.)
CmdStanR version number
Your CmdStanR version number (e.g. from packageVersion("cmdstanr")).
Additional context cmdstanr version 0.6.0.9000 R version 4.2.2 macOS 13.4 (Ventura)
(Probably an underlying Stan issue, not just cmdstanr)
Yeah if you run fit2$cmdstan_diagnose(), which runs CmdStan's diagnose utility, I think you'd see the same problem, indicating that CmdStan itself doesn't handle this properly. I'm going to move this issue to the CmdStan repository because I don't think we can address this in CmdStanR without a change to CmdStan.
I think the problem here is that the diagnostics like divergences are written to csv like the draws and then summarized afterwards, but thinning affects how many iterations are written to csv. So CmdStan would need some way to count and save the divergences that doesn't require writing every iteration to CSV.
The average tree depth should have the right expectation, so I wouldn't worry about that.
EBFMI is like R-hat and ESS in that it will fundamentally depend on the draws. I would not want to see this change to the underlying draws---these are features of the saved post-warmup draws.
That leaves divergences. I'd instead recommend estimating number of divergences as thinning rate * reported divergences, which I think should be correct in expectation.
And perhaps number of total log density evals, which we lose other than in expectation once we thin.