dynesty icon indicating copy to clipboard operation
dynesty copied to clipboard

Not possible to merge runs with different number of batches

Open rodleiva opened this issue 5 months ago • 3 comments

Dynesty version 2.1.4 installed with pip

Describe the bug It is not possible to merge two runs with different number of batches (eg. run 1 with 4 batches, and run 2 with 2 batches). As a consequence, it is also not possible to merge three or more runs, each with the same number of batches if the bounds are all different.

Setup Here a simple code that reads three pickle files and attempt to merge them. Each pickle file has two batches, a baseline batch plus an additional batch in mode=auto. The latest means that the bounds for the baseline runs are -inf, inf, while for the additional batch will be different in each run.

ls_fnpickle = ['data/checkpoint_it860a.pickle',  'data/checkpoint_it860b.pickle', 'data/checkpoint_it860c.pickle']
ls_results = []
for i, fn_pickle in enumerate(ls_fnpickle):
    data = pickle.load(open(fn_pickle, 'rb'))
    sampler = data['sampler']
    results = sampler.results
    ls_results.append(results)

    results_merged = dyfunc.merge_runs(ls_results)

** Dynesty output **

Bug In the previous example, merging run 1 with 2 works fine, but when merging the (1+2) with 3 generates this error in utils.py line 1980.

-> 1980 if np.all(base_info['bounds'] == new_info['bounds']):
   1981     bounds = base_info['bounds']
   1982     boffset = 0

ValueError: operands could not be broadcast together with shapes (4,2) (2,2) 

For example, when attempting to merge two runs, imported from a checkpoint pickle file. line 1980, in _merge_two if np.all(base_info['bounds'] == new_info['bounds']): ValueError: operands could not be broadcast together with shapes (4,2) (2,2)

Additional context My quick and dirty workaround was to replace utils.py line 1980 with these

if len(base_info['bounds']) == len(new_info['bounds']) and np.all(base_info['bounds'] == new_info['bounds']):

but I think that _merge_two() could need more careful reworking. For example, it could merge all the baselines into a single batch if the bounds are the same, and then add the additional batches. I'm not sure about this anyway.

Notice that if I attempt to merge N runs, all with a baseline batch, all have bounds -inf, inf the merging works fine. Other combinations of runs with M batches each should work fine as well as long as the batches bounds match. Each execution of _merge_two() will generate a M batches instead of 2xM as in the general case. I didn't test it.

rodleiva avatar Aug 28 '24 13:08 rodleiva