dynesty icon indicating copy to clipboard operation
dynesty copied to clipboard

dynesty 1.2.3 issue changes values of weights compared to 0.9.5.3

Open mmfausnaugh opened this issue 1 year ago • 3 comments

Describe the bug I recently upgraded my laptop and moved from dynesty 0.9.5.3 to 1.2.3. I'm finding different results using dynesty on the same data after the change---in particular, some runs result in weights equal to or less than zero, which invalidates some of the parameter estimations and causses trace/corner plots to fail.

Setup I am running version 1.2.3 and 0.9.5.3, both installed with pip. The setup is quite complicated, but I've made a public repository that I think will reproduce the behavior: https://github.com/mmfausnaugh/SN_analysis_debug If you clone this repository and run python sim_detection_limit.py, it will make synthetic data, run dynesty on it, save the chains to .npz files, and visualize models with the medians of the posteriors in a .png file.

Please let me know if you have issues running this code.

Bug There are no errors, so perhaps this is not strictly a bug.
(update: removed plots of sampling weights because of a math error)

The behavior I expected was consistent results between 1.2.3 and 0.9.5.3 (if not identical, since I am using seeds for the random numbers; but I'm not sure what the different processor architectures will do to that expectation).

To reproduce the behavior I see, please see https://github.com/mmfausnaugh/SN_analysis_debug.

When running my plotting code on some chains from real data, the weights go to zero and I get this error:

/Users/faus/miniconda3/envs/py310/lib/python3.10/site-packages/dynesty/plotting.py:1363: UserWarning: Attempting to set identical bottom == top == 0.0 results in singular transformations; automatically expanding.
  ax.set_ylim([0., max(n) * 1.05])
/Users/faus/miniconda3/envs/py310/lib/python3.10/site-packages/dynesty/plotting.py:2305: RuntimeWarning: invalid value encountered in true_divide
  sm /= sm[-1]
WARNING:root:No points at all in the plotted region
Traceback (most recent call last):
  File "/Users/faus/SN_analysis/SN_analysis_debug/plot_dynesty_chains.py", line 44, in <module>
    fig,axes = dyplot.cornerplot(results,
  File "/Users/faus/miniconda3/envs/py310/lib/python3.10/site-packages/dynesty/plotting.py", line 1446, in cornerplot
    _hist2d(y,
  File "/Users/faus/miniconda3/envs/py310/lib/python3.10/site-packages/dynesty/plotting.py", line 2359, in _hist2d
    ax.contourf(X2,
  File "/Users/faus/miniconda3/envs/py310/lib/python3.10/site-packages/matplotlib/__init__.py", line 1412, in inner
    return func(ax, *map(sanitize_sequence, args), **kwargs)
  File "/Users/faus/miniconda3/envs/py310/lib/python3.10/site-packages/matplotlib/axes/_axes.py", line 6317, in contourf
    contours = mcontour.QuadContourSet(self, *args, **kwargs)
  File "/Users/faus/miniconda3/envs/py310/lib/python3.10/site-packages/matplotlib/contour.py", line 812, in __init__
    kwargs = self._process_args(*args, **kwargs)
  File "/Users/faus/miniconda3/envs/py310/lib/python3.10/site-packages/matplotlib/contour.py", line 1441, in _process_args
    x, y, z = self._contour_args(args, kwargs)
  File "/Users/faus/miniconda3/envs/py310/lib/python3.10/site-packages/matplotlib/contour.py", line 1492, in _contour_args
    self._process_contour_level_args(args)
  File "/Users/faus/miniconda3/envs/py310/lib/python3.10/site-packages/matplotlib/contour.py", line 1194, in _process_contour_level_args
    raise ValueError("Contour levels must be increasing")
ValueError: Contour levels must be increasing

Additional context

Emailed to Josh Speagle:

I started my project some time ago and was using 0.9.5.3. Inspection of models from posterior draws shows that this run is quite good.

However, when using 1.2.3, I find very different posteriors, which seem to be "incorrect." When inspecting the weights, they seem very close to zero (plots are attached). In fact, if I try to make trace plots or corner plots with the updated version, it throws an error saying that it cannot find any points--inspection seems to indicate that every point is in a single bin and this causes errors when plotting with contours.

Draws from these posteriors are plausible but obviously worse fits to the data compared to posteriors drawn from 0.9.5.3. I also found large differences in the efficiency (which I take to be analogous to the acceptance fraction?) and log likelihood values at similar iterations (about iteration 4100 in what is below). Specific outputs are below, but I am seeing a factor of 10 worse efficiency and smaller log likelihood values (i.e., 10x more negative) on the version.

Some more details are below. I can confirm that I am using exactly the same code and data. I can say that previous results produced good parameters based on the posterior draws with recalculated weights np.exp(result['logwt'] - result['logz'][-1]).

I first noticed this when moving my analysis to a new Mac Book Pro with M1 processors; however, I tried a new run on the intel machine with python 3.10, numpy 1.22.3, and dynasty 1.2.3, and ended up with identical results.

Any advice you can give would be most appreciated. The main goal frommy perspective is to reproduce the old behavior from py3.6/dynesty 0.9.5.3 on py310/dynesty1.2.3.

macOS 12.4, M1 chip python 3.10 dynesty 1.2.3 numpy 1.22.3 output from dynesty: 4165it [02:37, 26.50it/s, batch: 0 | bound: 323 | nc: 395 | ncall: 305404 | eff(%): 1.362 | loglstar: -inf < -15266.643 < inf | logz: -15281.873 +/- 0.174 | dlogz: 20115.679 > 0.010]

macOS 12.4, intel chip python 3.6.8 dynesty 0.9.5.3 numpy 1.16.2 output from dynesty: iter: 4145 | eff(%): 14.947 | logl*: -inf<-12385.6< inf | logz: -12400.8+/-0.2 | dlogz: 15312.2> 0.0

When running 1.2.3 on the intel chip, it stays at 15% for longer, but eventually plummets to 2% and even worse.

mmfausnaugh avatar Aug 01 '22 21:08 mmfausnaugh

Hi,

While it is perfectly possible that the is a regression in 1.2.3, ATM I am not sure it is clear from what you show.

  • It is hard to tell from the distribution of weights, which one is right which is wrong. Do you have some evidence demonstrating that the new one is ? It is also not 100% clear to me which weights are you talking about here -- some weights-parameters in your model or dynesty sampling weights ?

  • There could be an issue with plotting as well, as plotting is much less rigorously tested.

  • I am able to run your test code, but because it's doing multiple fits, it's unclear what I should look at. If you can isolate one fit for one specific dataset, where you believe the results are incorrect wrt 0.9.5. that'd be helpful.

  • Regarding the lower efficiency, it's ATM hard to tell. There were correctness fixes in the multi-modal case, which made the sampling potentially less efficient. Whether that's the cause of problems or not is hard to tell. I would first investigate the question, whether ther results are incorrect. And then depending on the result, we can look into efficiency.

Basically, from my point of view in the case of different result I would trust 1.2.3 results over 0.9.5, but I'm happy to see a test case that would show that 1.2.3 is problematic, so we can fix it.

segasai avatar Aug 02 '22 02:08 segasai

Thanks for your attention on this!

-I am talking about the sampling weights. I had an error in one of my plots and they look more similar in the simulation---I've deleted those plots.

-I added .npz files with dynesty outputs to the debug repo that more clearly show my issue. These chains were derived from real data. The files are 2022exc_dynesty_py36.npz and 2022exc_dynest_py310.npz.

The main differences that I see are: --With the v0.9/py3.6, there are 42414 samples. With the 1.2.3/py310 version, there are 511 samples. --WIth v0.9, the max logz is 6643.2. With v1.2.3, max logz is -17557.7. --Runs with v1.2.3 fall well below 1% efficiency and takes longer to run. --Plotting the sampling weights shows a different distribution, though I'm not sure how to interpret these. FWIW, the differences are not as large when using simulated data, so I need to dig into this more (sorry about the previous message).

I will work today to add code that makes the results from real data reproducible. (There are several extra modules that run on real data and it will take some thought about how to push this to the debug repo.)

mmfausnaugh avatar Aug 02 '22 12:08 mmfausnaugh

From what you describe the concerning thing is the max(logz). Those should be similar if both fits converged. The most likely reason for the discrepancy (I'm guessing here) is that one of the posterior modes is missed. if that's the case, that would also be stochastic, different from run/run (depend on the random seed). IF the posterior mode is more frequently missed in 1.2.3 than 0.9.5, that's the problem indeed.

segasai avatar Aug 02 '22 14:08 segasai

Just checking, if there is still an issue @mmfausnaugh or we can close this.

segasai avatar Aug 30 '22 12:08 segasai

I'll close the issue for now. Feel free to reopen if the issue is still there.

segasai avatar Sep 08 '22 16:09 segasai