UltraNest icon indicating copy to clipboard operation
UltraNest copied to clipboard

Upon running with 'resume', Ultranest terminates the run with the message 'No changes made. Probably the strategy was to explore in the remainder, but it is irrelevant already; try decreasing frac_remain.'

Open garvitagarwal290 opened this issue 1 year ago • 5 comments

  • UltraNest version: 3.4.6
  • Python version: 3.9.7
  • Operating System: Rocky Linux 8.5 (Green Obsidian)

Description

I have time series data, and I am fitting them using models of 9 and 15 parameters. The fitting works okay for most of the data, but at the same time, for a large portion of them, the fitting runs for 96 hours, at which point it hits the wall-time of the HPC that I am using. Then I rerun the fitting of these time series using the 'resume' feature. The program runs for about 10 minutes, but then it terminates (with Exit_status = 0), and the final output of the run looks like the following.

[ultranest] Likelihood function evaluations: 428273387
[ultranest] Writing samples and results to disk ...
[ultranest] Writing samples and results to disk ... done
[ultranest] No changes made. Probably the strategy was to explore in the remainder, but it is irrelevant already; try decreasing frac_remain.
[ultranest] done iterating.

logZ = -21590.640 +- 1.022
  single instance: logZ = -21590.640 +- 0.235
  bootstrapped   : logZ = -21602.796 +- 0.751
  tail           : logZ = +- 0.693
insert order U test : converged: True correlation: inf iterations

    per_bary            : 56.131191572319658│                   ▇                   │57.231191572319652    56.681191572319669 +- 0.000000000000014
    a_bary              : 44.586│                   ▇                   │45.686    45.136 +- 0.000
    r_planet            : 0.065 │                   ▇                   │0.183     0.125 +- 0.000
    b_bary              : 0.000001000000000000│         ▇                             │0.731980832173530938    0.181980832173530810 +- 0.000000000000000056
    ecc_bary            : 0.073 │                       ▇               │1.000     0.623 +- 0.000
    w_bary              : 331.093│                   ▇                   │332.193    331.643 +- 0.000
    t0_bary_offset      : -0.050│                   ▇                   │0.050     -0.001 +- 0.000
    M_planet            305550513862216355811887677440 +- 35184372088832
    r_moon              : 0.000000123800000000│                      ▇                │0.123799876200000006    0.069941430543806068 +- 0.000000000000000014
    per_moon            : 0.612 │                   ▇                   │1.712     1.162 +- 0.000
    tau_moon            : 0.0000010000000000000│   ▇                                   │0.5984283447503323528    0.0484283447503322806 +- 0.0000000000000000069
    Omega_moon          : 211.584305462244203│                   ▇                   │212.684305462244225    212.134305462244271 +- 0.000000000000057
    i_moon              : 96.861177229315487│                   ▇                   │97.961177229315510    97.411177229315484 +- 0.000000000000014
    M_moon              577461808577478822312542208 +- 343597383680
    q1                  : 0.084 │                       ▇               │1.000     0.634 +- 0.000
    q2                  : 0.0000010000000000000│  ▇                                    │0.5840102855886486477    0.0340102855886486102 +- 0.0000000000000000069

I just wanted to understand the meaning of this. Does this mean that the fitting has already converged, and I can use the model parameter estimates? If yes, then why did the program keep running for 96 hours? If no, then how can I avoid this situation? By decreasing frac_remain as suggested in the output above?

PS: This doesn't happen every time I use the resume feature. In many cases, the fitting resumes okay, runs for several hours, and terminates successfully and normally.

garvitagarwal290 avatar Dec 10 '22 16:12 garvitagarwal290