montepython_public icon indicating copy to clipboard operation
montepython_public copied to clipboard

Possible error in io_mp.py regarding restart (-r) of chains

Open subhajitghosh-phy opened this issue 4 years ago • 2 comments

Hi,

I think there is an error regarding restart of chains in Montepython originating from the io_mp.py file -- in the definition of create_output_files(command_line, data).

Let's say I have the following files in my chains/ folder that I want to restart.

2020-06-13_3000000__1.txt 2020-06-15_3000000__1.txt

I am not using the chain-number option or mpi. I am also using -N 3000000.

Now when I restart the first chain it correctly got copied in the new chain named below and runs seamlessly.

2020-06-21_6000000__1.txt

But when I restart the second chain it gives an error. I traced back the error to line 300 of io_mp.py

restartname = filebase + sep1 + str(int(chainbase)+suffix-1) + sep2 + fileext

In this case, the 'restartname' is 2020-06-15_3000000__2.txt and this file does not exist. Therefore the following command fails:

for line in open(restartname, 'r'): data.out.write(line)

The problem is occurring due to 'suffix'. 'suffix' has been set to 2 in this case as the output chain is named: 2020-06-21_6000000__2.txt

This part of the code was modified to incorporate mpi runs. I checked the 2.2 version of Montepython where the above lines read:

for line in open(command_line.restart, 'r'): data.out.write(line)

-- which worked for me.

I hope this will get corrected in the next version.

Best, Subhajit

subhajitghosh-phy avatar Jun 20 '20 21:06 subhajitghosh-phy

Hi Subhajit,

You should check that the files are correctly copied. If I remember correctly, the restart option in MontePython 2.2 did not work properly.

The suffix chain number can be adjusted with the --chain_number flag, but I agree that currently the restart option is not designed to restart files from different dates. I'm not sure it's a good idea to change that behavior since only in specific cases would it be reasonable to do so (i.e. no covmat updates after the initial chain).

In your case with different dates I would suggest launching two separate instances of MontePython in parallel and passing the files separately, since MontePython does not actually require MPI to run and this easily gets around the date and chain number problem.

Best, Thejs

brinckmann avatar Jun 22 '20 01:06 brinckmann

Hi Thejs,

I have two different questions regarding the restart of chains.

  1. After restart, the chain files are copied to new chains but old chains are only deleted after the run has completed. With --update/--super-update ON the code also computes the covariance matrix periodically while running and takes into account ALL the chain files - right? Can it create issues with sampling since the old chain data points are getting doubly weighted here?

  2. While restarting is it advisable to start from the bestfit point of the previous chains or the last point of the previous chain?

Thanks.

Best, Subhajit

subhajitghosh-phy avatar Jul 08 '20 19:07 subhajitghosh-phy