dynesty icon indicating copy to clipboard operation
dynesty copied to clipboard

Do not compute n_effective when limit is infinite

Open edbennett opened this issue 1 year ago • 15 comments

Currently the maximum number of effective samples n_effective in Sampler.sample() defaults to np.inf, implying no limit, yet self.n_effective is still computed (and then compared with np.inf), which takes time. The code already checks for None; this PR extends that check to also check that n_effective is not positive infinity before computing n_effective; this cuts the time spent in each call to Sampler.sample() by about half in the tests I've run.

edbennett avatar Jul 18 '22 20:07 edbennett

Thanks, it looks good to me, I'll merge as soon as the tests are done. Can you please also add a line about the change to CHANGELOG.md ?

segasai avatar Jul 18 '22 21:07 segasai

Pull Request Test Coverage Report for Build 2728995896

  • 16 of 17 (94.12%) changed or added relevant lines in 2 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage increased (+0.2%) to 90.443%

Changes Missing Coverage Covered Lines Changed/Added Lines %
py/dynesty/sampler.py 8 9 88.89%
<!-- Total: 16 17
Totals Coverage Status
Change from base Build 2455805807: 0.2%
Covered Lines: 3672
Relevant Lines: 4060

💛 - Coveralls

coveralls avatar Jul 18 '22 21:07 coveralls

The decrease in coverage revealed the issue with this (I saw it before, but didn't have a good solution)

The n_effective is defined as one of the arguments of sample_initial

https://github.com/joshspeagle/dynesty/blob/8bf1b3144fb4bb57ad771968a9fa0edd2b8fb1f0/py/dynesty/dynamicsampler.py#L672

but it is not used in actual sampling

https://github.com/joshspeagle/dynesty/blob/8bf1b3144fb4bb57ad771968a9fa0edd2b8fb1f0/py/dynesty/dynamicsampler.py#L846

sample_batch() also does not use n_effective directly, but this exactly how it should be, because n_effective is used in the stopping function.

So presumably two things need to be done here

  • add a test for the static sampler using n_effective in run_nested()
  • fix the sample_initial() of dynamic sampler to use n_effective ?

segasai avatar Jul 18 '22 22:07 segasai

Ah yes, it makes sense that if none of the tests explicitly set n_effective then coverage will drop as a result of this change. I agree that that looks like a behaviour that should be tested, although I'd need to think more carefully about what a good test of this would be.

Regarding sample_initial(), my first thought was that this couldn't pass n_effective through directly as it was calling sample() in a loop, but then I noticed that the loop at line 844 is for i in range(1)—i.e. the loop executes exactly once (and with nothing like continue or break to interrupt this). Do you know what the intention of this loop was?

edbennett avatar Jul 19 '22 08:07 edbennett

Yes, the range(1) seems to be a no-op and can be probably removed.

And yes the reason why I was a bit reluctant to make the change to sample_initial is that I am not really sure that the n_effective stopping criteria is of any use in the static nested run or initial phase of the dynamic nested run. Because in the static run the number of effective samples is expected to rise rapidly as soon as you start reaching the bulk of the posterior mass, and after that it's expected to basically stop rising, so stopping earlier is not really useful, as it will either truncate the run too early, or will be an essentially no-op. If anything, I'd rather remove that option completely.

segasai avatar Jul 19 '22 10:07 segasai

In that case, does it seem better to remove it from sample_initial() but leave it in place (with a new test) in sample(), so that downstream code that uses sample() directly isn't affected (and can make their own decision about whether the algorithm is appropriate for them)?

If it is removed from sample_initial(), would that need a period with a deprecation warning (either with or without fixing the fact that currently it is a placebo)? Or just a straight removal?

edbennett avatar Jul 19 '22 12:07 edbennett

I think the right way is to put the deprecation warning and make the n_effective a no-op in sample_initial() That also means we need to deprecate n_effective_init in run_nested() of dynamic sampler and n_effective in run_nested() of static sampler. I don't know if @joshspeagle has an opinion on this, but I don't really see the point of n_effective for static/initial sampling (unless I miss something). But I guess we can keep the n_effective in .sample() for the time being.

segasai avatar Jul 19 '22 17:07 segasai

I don't know if @joshspeagle has an opinion on this, but I don't really see the point of n_effective for static/initial sampling (unless I miss something).

I don't either, so I'm fine with that being deprecated and/or ignored.

joshspeagle avatar Jul 19 '22 22:07 joshspeagle

Thanks both; to summarise, what remains to be done is I think:

  • [x] a test for Sampler.sample() with n_effective != np.inf
  • [x] add a deprecation warning when sample_initial() and run_nested() are called with n_effective and n_effective_init != np.inf respectively

edbennett avatar Jul 19 '22 22:07 edbennett

I've added a DeprecationWarning to the following parameters, and mentioned the deprecation in the docstrings.

  • The n_effective parameter of Sampler.run_nested (default is None)
  • The n_effective parameter of DynamicSampler.sample_initial() (default is np.inf)
  • The n_effective_init parameter of DynamicSampler.run_nested() (default is np.inf). As this option is actually not implemented, I've also removed the description of the parameter from the dostring.

I noticed that DynamicSampler.run_nested() also has an n_effective parameter in addition to the n_effective_init parameter that we discussed—does this also want to be deprecated, or is this still needed?

Relatedly, the docstring for this parameter describes it as "Minimum number of effective posterior samples needed during the entire run. If the estimated "effective sample size" (ESS) exceeds this number, sampling will terminate." The second sentence appears to describe a maximum rather than a minimum?

edbennett avatar Jul 23 '22 11:07 edbennett

All of the changes discussed above are now made (other than the n_effective parameter to DynamicSampler.run_nested(), which has been worked on relatively recently so I think behaves differently and is still wanted?). Is there anything else needed before this can be merged?

edbennett avatar Aug 01 '22 08:08 edbennett

Hi @edbennett,

Sorry, I've been traveling, and didn't have much time. I'll try to review this in next couple of days.

segasai avatar Aug 01 '22 22:08 segasai

Thanks for the patch, I've looked at it now. The dynamic nested sampling n_effective is indeed used and should not be deprecated. I've left a small comment on that for the changelog. Otherwise the patch looks pretty sensible to me. My only thought now, is whether it's better for the user to still allow staticnested.run_nested(n_effective=XXX), but instead of deprecation warning throw a RuntimeWarning saying that it's a no-op. Thiis way it'll be easier for the user to switch from static to dynamic sampler. In all other places I'd keep deprecationwarnings.

segasai avatar Aug 02 '22 03:08 segasai

My only thought now, is whether it's better for the user to still allow staticnested.run_nested(n_effective=XXX), but instead of deprecation warning throw a RuntimeWarning saying that it's a no-op. Thiis way it'll be easier for the user to switch from static to dynamic sampler. In all other places I'd keep deprecationwarnings.

I would support this being a feature for the same reason (ease of switching between static and dynamic sampler objects).

joshspeagle avatar Aug 02 '22 03:08 joshspeagle

Thanks for your replies

My only thought now, is whether it's better for the user to still allow staticnested.run_nested(n_effective=XXX), but instead of deprecation warning throw a RuntimeWarning saying that it's a no-op.

I'm not sure this is true though? Sampler.run_nested() does pass the n_effective on to Sampler.sample(). It is DynamicSampler.sample_initial() that does a no-op with n_effective. Or is the suggestion to make this a no-op now rather than waiting through a deprecation period?

edbennett avatar Aug 02 '22 08:08 edbennett

Okay, to avoid delaying further, I'll merge it now. Thanks for the change! To be honest, we probably want to restructure this bit, i.e. completely get rid of this https://github.com/edbennett/dynesty/blob/4e0594eac0f2fdd224ab8b79dd348813a842e467/py/dynesty/sampler.py#L683 but maybe I'm missing some usecases of static.sample(n_effective=X)

segasai avatar Aug 15 '22 19:08 segasai

Many thanks Sergey!

To be honest, we probably want to restructure this bit, i.e. completely get rid of this https://github.com/edbennett/dynesty/blob/4e0594eac0f2fdd224ab8b79dd348813a842e467/py/dynesty/sampler.py#L683 but maybe I'm missing some usecases of static.sample(n_effective=X)

I know that Parallel Bilby surfaces this option, but I'm not sure whether it's ever used—I've asked the developers.

edbennett avatar Aug 16 '22 09:08 edbennett