UltraNest
UltraNest copied to clipboard
`saved_logwt_bs` error after completion
- UltraNest version: 3.4.4
- Python version: 3.8.8
- Operating System: linux, hpc
Description
(let me preface by saying i love this piece of code)
i am solving a 6D fitting problem using ultranest. Five are angles hence are wrapped parameters. One is a simple parameter.
i am running it on hpc with 480 tasks using MPI.
it crashes after it has converged to the ML point.
CHI : +0.000| +2.756 * | +3.142
NU : +0.00| +2.17 * +2.19 | +3.14
DELTA : +0.00| +1.88 * +1.91 | +3.14
TAU : +0.000| +1.569 * +1.577 | +6.283
PHI : +0.000| +3.799 * +3.803 | +6.283
OMEGA_SPIN: +0.050006283| +2.520271104 * +2.520271106 |+6.333179024
[ultranest] Explored until L=-2e+02 9.61 [-189.6137..-189.6128]*| it/evals=950560/566217664 eff=0.1679% N=16384
[ultranest] Likelihood function evaluations: 566217664
Traceback (most recent call last):
File "/hercules/results/sbethapudi/frbm/frb-modeling/python/run_precession_kp.py", line 171, in <module>
result = sampler.run (
File "/u/sbethapudi/.local/lib/python3.8/site-packages/ultranest/integrator.py", line 2173, in run
for result in self.run_iter(
File "/u/sbethapudi/.local/lib/python3.8/site-packages/ultranest/integrator.py", line 2539, in run_iter
self._update_results(main_iterator, saved_logl, saved_nodeids)
File "/u/sbethapudi/.local/lib/python3.8/site-packages/ultranest/integrator.py", line 2635, in _update_results
results = combine_results(
File "/u/sbethapudi/.local/lib/python3.8/site-packages/ultranest/netiter.py", line 897, in combine_results
recv_saved_logwt_bs = mpi_comm.gather(saved_logwt_bs, root=0)
File "mpi4py/MPI/Comm.pyx", line 1262, in mpi4py.MPI.Comm.gather
File "mpi4py/MPI/msgpickle.pxi", line 680, in mpi4py.MPI.PyMPI_gather
File "mpi4py/MPI/msgpickle.pxi", line 685, in mpi4py.MPI.PyMPI_gather
File "mpi4py/MPI/msgpickle.pxi", line 148, in mpi4py.MPI.Pickle.allocv
File "mpi4py/MPI/msgpickle.pxi", line 139, in mpi4py.MPI.Pickle.alloc
SystemError: Negative size passed to PyBytes_FromStringAndSize
This has happened during multiple runs with the same error. Looking through the traceback, it is failing in the gather step.
This happened after the iteration has completed.
In my code, after i run sampler.run , i run store_tree,print_results,plot_corner.
What I Did
these are my parameters for run.
min_num_live_points=16384,
## target evidence uncertainty
dlogz=1e-1,
dKL=1e-1,
frac_remain=1E-9,
Lepsilon=1E-5,
## less than live_points
min_ess=8192,
max_num_improvement_loops=8,
If you can reproduce, please add a print before to see what saved_logwt_bs contains and how large it is before this line:
recv_saved_logwt_bs = mpi_comm.gather(saved_logwt_bs, root=0)
in file "/u/sbethapudi/.local/lib/python3.8/site-packages/ultranest/netiter.py", line 897, in combine_results
This is an error I have not seen before.
https://github.com/mpi4py/mpi4py/issues/23 suggests you may have crossed a 2GB threshold that your MPI does not support. I guess this translates into a limit on number of live points x number of iterations, the latter is increased with the improvement loops.
(sorry to dig up this old issue; i need to have large number of live points and am running into the 2GB threshold more frequently)
This SO answer recommends to replace lower case mpi4py methods with their upper case methods.
I want to edit the combine_results of netiter.py so that they use upper case methods.
For lines 905-906 and 909-910, (1) is the gather followed by bcast equivalent to Allgather ?
Also, (2) why is the bcast necessary? i am guessing results dictionary of the root is what is necessary.
Does this seem reasonable? thanks in advance.
Have you already tried switching MPI implementation? Your cluster should allow you to select different ones.
i just tried with a toy example where i perform a gather such that the result is more than 2GB. Naturally, it gave the same error.
according to that SO answer, this is a inherent limitation of using lower case methods as they use pickle internally. So i never tried changing the MPI version before. But, I tried changing it now, and it does not change the outcome.
Using Allgather with the toy example works. i will update combine_results and let you know.
You could alternatively reduce the number of bootstrap rounds.