Abmarl icon indicating copy to clipboard operation
Abmarl copied to clipboard

Resume training

Open rusu24edward opened this issue 2 years ago • 1 comments

It would be nice to be able to resume training either by (1) using the same configuration file or (2) passing the output directory to the train command.

According to ray, you can resume a training run through tune via

tune.run(
    train,
    # other configuration
    name="my_experiment",
    resume=True
)

This will look for the experiment/directory name called my_experiment in ~/ray_results. This wont' work for Abmarl since we store the results in abmarl_results.

The API doc shows that we can also resume with

tune.run(my_trainable, config=space,
         local_dir=<path/to/dir>, resume=True)

Additionally, it looks like the restore parameter may also be used. Here, we would pass the checkpoint file to the tune command and it would continue from there. Need to test this out a bit to make sure.

We need to figure out if we would store the new data in the same directory as that from which we loaded the checkpoint.

rusu24edward avatar Jan 25 '23 19:01 rusu24edward

Stage uses trainer.restore. So maybe that in conjunction with tune.run(my_trainable)

rusu24edward avatar Nov 10 '23 02:11 rusu24edward