ART icon indicating copy to clipboard operation
ART copied to clipboard

Optimizer state isn't preserved across runs

Open corbt opened this issue 8 months ago • 2 comments

I often have to restart a run, either to fix something in my reward function, in response to an OOM or crash that broke training, etc. When I do, by restarting the training process the optimizer state is thrown away. I’m worried that this might lead to worse performance than just letting a run go all the way through. Is it easy to save the optimizer state along with the weights so we can truly resume as if nothing happened?

corbt avatar Apr 21 '25 18:04 corbt