adversarial-policies icon indicating copy to clipboard operation
adversarial-policies copied to clipboard

Checkpointing support with ray Tune

Open AdamGleave opened this issue 5 years ago • 1 comments

It would be nice to make modelfree.hyperparams.train_rl a tune.Trainable rather than a function, adding checkpointing support. This would let us use the HyperBand and Population Based Training schedulers. Conceptually this is easy enough: we already supporting saving models via save_callbacks, and can restore using load_path. However, the interfaces don't quite line up: Ray expects _train to perform one small training step, with _save called in between. There's no good way to make Stable Baselines return part-way. We could call it repeatedly with small total_timesteps, but this would make the progress be wrong, breaking annealers.

AdamGleave avatar Mar 16 '19 01:03 AdamGleave

Add support for this once https://github.com/ray-project/ray/issues/6162 is closed / https://github.com/ray-project/ray/pull/6192 is merged

AdamGleave avatar Nov 24 '19 02:11 AdamGleave