pints icon indicating copy to clipboard operation
pints copied to clipboard

Diagnostics module design

Open MichaelClerx opened this issue 4 years ago • 19 comments

We have a private module pints._diagnostics that declares public and private methods, some of the public methods are exposed to the user as part of the public API.

However, the test for diagnostics imports the hidden module directly and tests methods not visible to the user.

Maybe this is fine, but usually it's a sign that we need to redesign a bit?

I'd propose making the diagnostics module public, so that pints.effective_sample_size becomes e.g. pints.diagnostics.effective_sample_size. Then we can make other methods, e.g. autocorrelation public too, without cluttering the pints namespace.

We could also imagine having them under pints.mcmc, but at the moment that only contains MCMCSamplers (even the MCMCController is in pints)

Thoughts @ben18785 @DavAug ?

MichaelClerx avatar Feb 09 '21 14:02 MichaelClerx

Sounds like a plan. I'd keep the diagnostics outside of pints.mcmc as can imagine having diagnostics for other methods later on (and it's annoying-er to do import pints.mcmc.diagnostics).

On Tue, Feb 9, 2021 at 2:55 PM Michael Clerx [email protected] wrote:

We have a private module pints._diagnostics that declares public and private methods, some of the public methods are exposed to the user as part of the public API.

However, the test for diagnostics imports the hidden module directly and tests methods not visible to the user.

Maybe this is fine, but usually it's a sign that we need to redesign a bit?

I'd propose making the diagnostics module public, so that pints.effective_sample_size becomes e.g. pints.diagnostics.effective_sample_size. Then we can make other methods, e.g. autocorrelation public too, without cluttering the pints namespace.

We could also imagine having them under pints.mcmc, but at the moment that only contains MCMCSamplers (even the MCMCController is in pints)

Thoughts @ben18785 https://github.com/ben18785 @DavAug https://github.com/DavAug ?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pints-team/pints/issues/1283, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABCILKABDCDRRHHPL2JU2JDS6FEGDANCNFSM4XLERGWA .

ben18785 avatar Feb 09 '21 14:02 ben18785

@DavAug are you happy to pick this up? You seem to be into diagnostics at the moment 😄

MichaelClerx avatar Feb 09 '21 15:02 MichaelClerx

Yep no problem. But before I do this @ben18785 are we still planning on outsourcing all of this to arviz? Because then it may not be necessary to implement a diagnostics module?

Get Outlook for iOShttps://aka.ms/o0ukef


From: Michael Clerx [email protected] Sent: Tuesday, February 9, 2021 4:30:38 PM To: pints-team/pints [email protected] Cc: David Augustin [email protected]; Mention [email protected] Subject: Re: [pints-team/pints] Diagnostics module design (#1283)

@DavAughttps://github.com/DavAug are you happy to pick this up? You seem to be into diagnostics at the moment 😄

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/pints-team/pints/issues/1283#issuecomment-776024919, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AEY2T3XVRGUPKKROPSG6SLTS6FIJ5ANCNFSM4XLERGWA.

DavAug avatar Feb 09 '21 15:02 DavAug

So, I think we still want our existing diagnostics methods but that any new ones come from Arviz...

On Tue, Feb 9, 2021 at 3:47 PM David Augustin [email protected] wrote:

Yep no problem. But before I do this @ben18785 are we still planning on outsourcing all of this to arviz? Because then it may not be necessary to implement a diagnostics module?

Get Outlook for iOShttps://aka.ms/o0ukef


From: Michael Clerx [email protected] Sent: Tuesday, February 9, 2021 4:30:38 PM To: pints-team/pints [email protected] Cc: David Augustin [email protected]; Mention < [email protected]> Subject: Re: [pints-team/pints] Diagnostics module design (#1283)

@DavAughttps://github.com/DavAug are you happy to pick this up? You seem to be into diagnostics at the moment 😄

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub< https://github.com/pints-team/pints/issues/1283#issuecomment-776024919>, or unsubscribe< https://github.com/notifications/unsubscribe-auth/AEY2T3XVRGUPKKROPSG6SLTS6FIJ5ANCNFSM4XLERGWA>.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pints-team/pints/issues/1283#issuecomment-776037278, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABCILKHHT5HJ726RJFQFRWDS6FKJNANCNFSM4XLERGWA .

ben18785 avatar Feb 09 '21 15:02 ben18785

I am modifying the docstrings a little bit to let the users know in more detail which assumptions are made for the ESS and autocorrelation computation. Following the stan discussion we are currently actually using a somewhat less motivated truncation criterion for the ESS computation, see https://discourse.mc-stan.org/t/n-eff-bda3-vs-stan/2608/20.

In brief, currently we only keep the correlations up to the first negative correlation. Apparently there is no theoretic justification for this, instead one should consider pairs of correlations, i.e. (sum of correlation at lag 0 and 1, at lag 2 and 3, at lag 4 and 5 etc.). The first negative pair should be used as truncation criterion.

I would like to make changes according to stan and just wanted to check with you whether there was a theoretical justification for the current implementation, @ben18785 @MichaelClerx @chonlei @rcw5890 ?

DavAug avatar Feb 14 '21 11:02 DavAug

@MichaelClerx , I have started to move diagnostics functions into a separate diagnostics module. How does the deprecation work for those functions? Or is it ok to just delete them from the pints namespace?

DavAug avatar Feb 14 '21 12:02 DavAug

@DavAug I usually do something like this:

import warnings

def method_that_I_want_to_remove_now(self):
    # Deprecated on 2021-02-14
    warnings.warn('pints.method_that_I_want_to_remove_now is deprecated and will be removed in future versions of Pints')
    <original code here>
    
def method_with_new_name(self):
    <method code here>

def method_with_old_name(self, x):
    # Deprecated on ...
    warnings.warn('pints.method_with_old name is deprecated and will be removed in future versions of Pints, please use method_with_new_name() instead.')
    return method_with_new_name(x)

MichaelClerx avatar Feb 14 '21 13:02 MichaelClerx

Hi David, happy for you to make those changes if you like. Perhaps it'd just be better though to move wholesale to Arviz? There are a host of changes to ESS that are possible (your proposed is one of them), so rather than reinvent the wheel, I think just moving to Arviz makes sense.

On Sun, Feb 14, 2021 at 11:14 AM David Augustin [email protected] wrote:

I am modifying the docstrings a little bit to let the users know in more detail which assumptions are made for the ESS and autocorrelation assumptions. Following the stan discussion we are currently actually using a somewhat less motivated truncation criterion for the ESS computation, see https://discourse.mc-stan.org/t/n-eff-bda3-vs-stan/2608/20.

In brief, currently we only keep the correlations up to the first negative correlation. Apparently there is no theoretic justification for this, instead one should consider pairs of correlations, i.e. (sum of correlation at lag 0 and 1, at lag 2 and 3, at lag 4 and 5 etc.). The first negative pair should be used as truncation criterion.

I would like to make changes according to stan and just wanted to check with you whether there was a theoretical justification for the current implementation, @ben18785 https://github.com/ben18785 @MichaelClerx https://github.com/MichaelClerx @chonlei https://github.com/chonlei @rcw5890 https://github.com/rcw5890 ?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pints-team/pints/issues/1283#issuecomment-778762964, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABCILKGNAOUKZNFYM3SMHS3S66WBRANCNFSM4XLERGWA .

ben18785 avatar Feb 14 '21 17:02 ben18785

I agree hundred percent that arviz does everything we want (except maybe the multivariate rhat). I thought however that the current metrics were supposed to be moved? If that’s the case we might want to fix the bug in ESS because the current truncation seems to be wrong / arbitrary.

Get Outlook for iOShttps://aka.ms/o0ukef


From: ben18785 [email protected] Sent: Sunday, February 14, 2021 6:36:58 PM To: pints-team/pints [email protected] Cc: David Augustin [email protected]; Mention [email protected] Subject: Re: [pints-team/pints] Diagnostics module design (#1283)

Hi David, happy for you to make those changes if you like. Perhaps it'd just be better though to move wholesale to Arviz? There are a host of changes to ESS that are possible (your proposed is one of them), so rather than reinvent the wheel, I think just moving to Arviz makes sense.

On Sun, Feb 14, 2021 at 11:14 AM David Augustin [email protected] wrote:

I am modifying the docstrings a little bit to let the users know in more detail which assumptions are made for the ESS and autocorrelation assumptions. Following the stan discussion we are currently actually using a somewhat less motivated truncation criterion for the ESS computation, see https://discourse.mc-stan.org/t/n-eff-bda3-vs-stan/2608/20.

In brief, currently we only keep the correlations up to the first negative correlation. Apparently there is no theoretic justification for this, instead one should consider pairs of correlations, i.e. (sum of correlation at lag 0 and 1, at lag 2 and 3, at lag 4 and 5 etc.). The first negative pair should be used as truncation criterion.

I would like to make changes according to stan and just wanted to check with you whether there was a theoretical justification for the current implementation, @ben18785 https://github.com/ben18785 @MichaelClerx https://github.com/MichaelClerx @chonlei https://github.com/chonlei @rcw5890 https://github.com/rcw5890 ?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pints-team/pints/issues/1283#issuecomment-778762964, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABCILKGNAOUKZNFYM3SMHS3S66WBRANCNFSM4XLERGWA .

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/pints-team/pints/issues/1283#issuecomment-778811288, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AEY2T3UVNWICCI2T5XLJHJLS7AC3VANCNFSM4XLERGWA.

DavAug avatar Feb 14 '21 17:02 DavAug

Hi David: it's not actually a bug, just the old definition of how this was handled for ESS. So, think it's ok to leave as is rather than update to a newer method for calculating it.

On Sun, Feb 14, 2021 at 5:41 PM David Augustin [email protected] wrote:

I agree hundred percent that arviz does everything we want (except maybe the multivariate rhat). I thought however that the current metrics were supposed to be moved? If that’s the case we might want to fix the bug in ESS because the current truncation seems to be wrong / arbitrary.

Get Outlook for iOShttps://aka.ms/o0ukef


From: ben18785 [email protected] Sent: Sunday, February 14, 2021 6:36:58 PM To: pints-team/pints [email protected] Cc: David Augustin [email protected]; Mention < [email protected]> Subject: Re: [pints-team/pints] Diagnostics module design (#1283)

Hi David, happy for you to make those changes if you like. Perhaps it'd just be better though to move wholesale to Arviz? There are a host of changes to ESS that are possible (your proposed is one of them), so rather than reinvent the wheel, I think just moving to Arviz makes sense.

On Sun, Feb 14, 2021 at 11:14 AM David Augustin [email protected] wrote:

I am modifying the docstrings a little bit to let the users know in more detail which assumptions are made for the ESS and autocorrelation assumptions. Following the stan discussion we are currently actually using a somewhat less motivated truncation criterion for the ESS computation, see https://discourse.mc-stan.org/t/n-eff-bda3-vs-stan/2608/20.

In brief, currently we only keep the correlations up to the first negative correlation. Apparently there is no theoretic justification for this, instead one should consider pairs of correlations, i.e. (sum of correlation at lag 0 and 1, at lag 2 and 3, at lag 4 and 5 etc.). The first negative pair should be used as truncation criterion.

I would like to make changes according to stan and just wanted to check with you whether there was a theoretical justification for the current implementation, @ben18785 https://github.com/ben18785 @MichaelClerx https://github.com/MichaelClerx @chonlei https://github.com/chonlei @rcw5890 https://github.com/rcw5890 ?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <https://github.com/pints-team/pints/issues/1283#issuecomment-778762964 , or unsubscribe < https://github.com/notifications/unsubscribe-auth/ABCILKGNAOUKZNFYM3SMHS3S66WBRANCNFSM4XLERGWA

.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub< https://github.com/pints-team/pints/issues/1283#issuecomment-778811288>, or unsubscribe< https://github.com/notifications/unsubscribe-auth/AEY2T3UVNWICCI2T5XLJHJLS7AC3VANCNFSM4XLERGWA

.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pints-team/pints/issues/1283#issuecomment-778812055, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABCILKAUIAHZ4GOQAXOD7ZDS7ADN7ANCNFSM4XLERGWA .

ben18785 avatar Feb 14 '21 17:02 ben18785

Hi David: it's not actually a bug, just the old definition of how this was handled for ESS. So, think it's ok to leave as is rather than update to a newer method for calculating it.

Ah ok. Would you mind if I changed it already?

According to the discussion in the Stan forum above, it seems that Stan's "old" version was not very well justified. They argue specifically for the NUTS sampler.

"Exact autocorrelations can happen only on odd lags (Geyer, 2011). By summing over pairs, the paired autocorrelation is guaranteed to be positive modulo estimator noise. This is the motivation behind the many termination criterion of Geyer (2011). Stan does not (yet) do the paired expectations because NUTS almost by construction avoids the negative autocorrelation regime. Thus terminating at the first negative autocorre- lation is a reasonable approximation for stopping when the noise in the autocorrela- tion estimator dominates."

According to Vehtari that doesn't seem to have a sound theoretical justification, but Geyer's original truncation criterion does.

"I’m trying to compare n_eff computation described in BDA3 and the implemented code, and I think there is a small difference. The comment in the beginning says the implementation matches BDA3. BDA3 (at least version 20150505) describes truncation of the autocorrelation series using Geyer’s initial positive sequence (Geyer, 1992) where truncation “T is the first odd positive integer for which $\widehat{\rho}{T+1}+\widehat{\rho}{T+2}$ is negative.”. If I understand the code (stan/src/stan/mcmc/chains.hpp) it’s using “T is the last positive integer for which \widehat{\rho}_{T} is positive.”, that is, it’s not using the sum of odd and even lags for which Geyer had a theoretical argument."

So I would argue to adapt to Stan (also given that I've implemented it already) :D

DavAug avatar Feb 14 '21 18:02 DavAug

The soon to be deprecated pints.effective_sample_size will still be computing the effective sample size as we had it before, so it wouldn't change anything for other scripts that have used this estimate.

DavAug avatar Feb 14 '21 18:02 DavAug

Yep, fine if you’ve already done it — thanks!

On 14 Feb 2021, at 18:46, David Augustin [email protected] wrote:

The soon to be deprecated pints.effective_sample_size will still be computing the effective sample size as we had it before, so it wouldn't change anything for other scripts that have used this estimate.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

ben18785 avatar Feb 14 '21 19:02 ben18785

I have created a diagnostics module with the following functions

  • [x] pints.diagnostics.autocorrelation
  • [x] pints.diagnostics.effective_sample_size
  • [x] pints.diagnostics.multivariate_rhat
  • [x] pints.diagnostics.rhat

All functions can handle multiple chains with multiple parameters.

  • [x] The effective sample size now uses the Geyer criterion as discussed above.
  • [x] The effective sample size can now also deal with multiple chains and approximate ESS according to Vehtari et al (goes beyond "simple" addition of ESS from chains)

DavAug avatar Feb 14 '21 20:02 DavAug

Open questions:

  1. Do we want to move the MCMCSummary as well into the diagnostics? I.e. deprecate it in the pints namespace and make it available in pints.diagnostics
  2. Do we want to change the MCMCSummary's ESS estimation to the now implemented ESS estimation across multiple chains following Vehtari et al (both is possible with the new ESS implementation)
  3. Does it make sense to also move plots that are specifically for diagnosting into the diagnostics namespace?

DavAug avatar Feb 14 '21 20:02 DavAug

Thanks @DavAug

In response to your questions:

  1. I don't mind. It probably makes most sense for it to go in the diagnostics module.
  2. Yes, if it's already implemented then let's switch.
  3. I'd keep plots where they are for now because I can see arguments either way (and it's less work to keep them where they are).

ben18785 avatar Feb 15 '21 12:02 ben18785

@DavAug this one looks quite finished to me?

MichaelClerx avatar Jul 04 '22 14:07 MichaelClerx

@DavAug this one looks quite finished to me?

Oh boy, I haven't looked at this for a long time. If I remember correctly, I didn't quite finish the testing, because it wasn't quite obvious to me at the time how to do it in a principled way. The diagnostics module itself should be ready though.

DavAug avatar Jul 04 '22 17:07 DavAug

Probably best to skip the principled bit, and just compare the output to a hardcoded result?

MichaelClerx avatar Jul 04 '22 17:07 MichaelClerx