rstan Threshold for R-hat (1.01 or 1.05)

Summary:

In rstan:::throw_sampler_warnings() and on the help page for rstan::Rhat(), the threshold of 1.05 is used/recommended for checking the "new" R-hat, whereas 1.01 is the threshold recommended at other (more recent?) places. I am uncertain whether rstan's current behavior is intended or a bug.

Description:

It seems like 1.01 is the currently recommended threshold for the "new" R-hat (as proposed by Vehtari et al. (2020) and as implemented in rstan::Rhat()), see Vehtari et al. (2020) and the Brief Guide to Stan's Warnings. However, in rstan:::throw_sampler_warnings() and on the help page for rstan::Rhat(), the (older) threshold of 1.05 is used or recommended, respectively. Is this intended or a bug?

Btw, the reference for Vehtari et al. (2020) needs to be updated at some places:

the help page for rstan::Rhat()
the "Brief Guide to Stan's Warnings"
... (perhaps other places)

Reproducible Steps:

May be seen from the source code of rstan:::throw_sampler_warnings() and ?rstan::Rhat.

Current Output:

Not applicable.

Expected Output:

Not applicable.

RStan Version:

I have 2.19.3 installed, but this also seems to apply to 2.21.1.

R Version:

This is independent of the R version.

Operating System:

This is independent of the OS.

Jul 24 '20 12:07 fweber144

The 1.01 number is our current recommendation.

It's a separate issue as to when to throw warnings at the user. I think it'd be better to do that at 1.01, too. Typically, our more stats-oriented devs who use R are wary of warnings, thinking they'll make users mistrust the software. After all, if glm() fails in R, say because of separability in a logistic regression, it doesn't throw any kind of warning at all. It just stops the iterations and gives coefficient estimates like 200.

Jul 24 '20 14:07 bob-carpenter

Thanks for bringing this up @fweber144. It looks like some things fell through the cracks when the latest warnings were implemented.

I'm also tagging @bgoodri @paul-buerkner @avehtari since we need to sort out the warnings for this across our R packages and papers. It's really a mess right now because there are too many places that need to get updated when something changes. I think the plan going forward is to move all of this to the posterior package so that all of the packages can rely on a single implementation of the convergence warnings that can be more easily kept in line with the latest recommendations. (This is related to https://github.com/stan-dev/rstan/issues/769 and https://github.com/stan-dev/posterior/issues/77)

Typically, our more stats-oriented devs who use R are wary of warnings, thinking they'll make users mistrust the software.

That definitely used to be true for certain people, but is that still true? We throw a ton of warnings in our R packages! In fact the topic of this post is not that we're missing a warning, just that the threshold for throwing it is a bit out of date.

Jul 24 '20 19:07 jgabry

It's a separate issue as to when to throw warnings at the user. I think it'd be better to do that at 1.01, too. Typically, our more stats-oriented devs who use R are wary of warnings, thinking they'll make users mistrust the software.

Yes, you're right that the recommendation and the warning may be treated separately. But the way this is currently done in rstan is rather confusing, I think. And a warning which is based on the 1.01 threshold may be formulated cautiously, in the sense of "The current threshold of 1.01 is rather strict, so especially keep false positive alarms in mind."

After all, if glm() fails in R, say because of separability in a logistic regression, it doesn't throw any kind of warning at all. It just stops the iterations and gives coefficient estimates like 200.

Yes, that glm() behavior is definitely not the way it should be done.

I also agree that in the long term, handling the convergence diagnostics in one central place is a good idea, @jgabry. The posterior package seems to be a good choice, especially since cmdstanr is already relying on the posterior package.

That definitely used to be true for certain people, but is that still true? We throw a ton of warnings in our R packages!

I've always found Stan's/rstan's etc. warnings very helpful, especially in combination with all the recommendations you give in the documentation, in The Stan Forums, in papers, etc. So I think you should keep as many sensible warnings as possible. As explained above, the content of the warning message may always be formulated cautiously.

Jul 24 '20 20:07 fweber144

I would say above 1.01 is cause for concern with Stan but we thought if there were thousands of parameters, some R-hats would be estimated greater than 1.01 even if the convergence was fine. So, the threshold for a warning was bumped up to 1.05.

On Fri, Jul 24, 2020 at 3:39 PM Jonah Gabry [email protected] wrote:

Thanks for bringing this up @fweber144 https://github.com/fweber144. It looks like some things fell through the cracks when the latest warnings were implemented.

I'm also tagging @bgoodri https://github.com/bgoodri @paul-buerkner https://github.com/paul-buerkner @avehtari https://github.com/avehtari since we need to sort out the warnings for this across our R packages and papers. It's really a mess right now because there are too many places that need to get updated when something changes. I think the plan going forward is to move all of this to the posterior package so that all of the packages can rely on a single implementation of the convergence warnings that can be more easily kept in line with the latest recommendations. (This is related to #769 https://github.com/stan-dev/rstan/issues/769 and stan-dev/posterior#77 https://github.com/stan-dev/posterior/issues/77)

Typically, our more stats-oriented devs who use R are wary of warnings, thinking they'll make users mistrust the software.

That definitely used to be true for certain people, but is that still true? We throw a ton of warnings in our R packages! In fact the topic of this post is not that we're missing a warning, just that the threshold for throwing it is a bit out of date.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/stan-dev/rstan/issues/812#issuecomment-663701289, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZ2XKS5KS5J3X5VSTDCTSTR5HPQXANCNFSM4PGVQA4A .

Jul 24 '20 22:07 bgoodri

[...] if there were thousands of parameters, some R-hats would be estimated greater than 1.01 even if the convergence was fine.

Yes, that's something I've also encountered for some models/data. But personally, I would still prefer "softening" the warning message for the 1.01 threshold instead of choosing the 1.05 threshold. One could also add your remark concerning models with lots of parameters to the R-hat section of the Brief Guide to Stan's Warnings. The warning message from rstan:::throw_sampler_warnings() contains a link to that section.

Jul 25 '20 10:07 fweber144

I think that in our latest Rhat paper we also moved to 1.05 as we realized 1.01 was too strict (EDIT: I was wrong as Aki clarified). @avehtari may know more of that history.

Frank Weber [email protected] schrieb am Sa., 25. Juli 2020, 12:52:

[...] if there were thousands of parameters, some R-hats would be estimated greater than 1.01 even if the convergence was fine.

Yes, that's something I've also encountered for some models/data. But personally, I would still prefer "softening" the warning message for the 1.01 threshold instead of choosing the 1.05 threshold. One could also add your remark concerning models with lots of parameters to the R-hat section of the Brief Guide to Stan's Warnings https://mc-stan.org/misc/warnings.html#r-hat. The warning message from rstan:::throw_sampler_warnings() contains a link to that section.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/stan-dev/rstan/issues/812#issuecomment-663841329, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADCW2ADTKPQYS3L362MTWDTR5K2M7ANCNFSM4PGVQA4A .

Jul 26 '20 08:07 paul-buerkner

I think that in our latest Rhat paper we also moved to 1.05 as we realized 1.01 was too strict.

No we didn't.

Let me get back from my vacation and I'll post a justified proposal to go beyond simple dichotomizing threshold.

Jul 26 '20 09:07 avehtari

You are right, we did not. Not sure why I thought we did. Anyway, thanks for clarifying.

Jul 26 '20 09:07 paul-buerkner

Btw, this issue also applies to the explanation printed by rstan::monitor().

Jul 28 '20 11:07 fweber144

@avehtari Hi, it has been a while for this post. However, have you concluded this topic elsewhere? Could you suggest where can I follow this?

Jan 17 '24 05:01 chainorato

Here's the published paper. Nothing in Stan enforces any kind of threshold on R-hat, but we have switched over our convergence monitoring and ESS estimation to the approach outlined here.

https://projecteuclid.org/journals/bayesian-analysis/volume-16/issue-2/Rank-Normalization-Folding-and-Localization--An-Improved-Rˆ-for/10.1214/20-BA1221.full

Jan 17 '24 17:01 bob-carpenter

Thank you for your answer! @bob-carpenter

Jan 18 '24 03:01 chainorato