sd-scripts icon indicating copy to clipboard operation
sd-scripts copied to clipboard

Resuming training from a state that was resumed training from earlier state behaves weird

Open Hirmuolio opened this issue 4 months ago • 0 comments

When resuming training from a training state that was also created from resuming training from a training state behaves weird.

Example: 5 epochs. Save state every epoch.

  • Blue line: Normal training from start to finish.
  • Red line: Resume from state of epoch 2. resume = "E:/training/output/test_1-000002-state"
  • yellowline: Resumed from first saved state (epoch 3) of the previously resumed training resume = "E:/training/output/test_2-000003-state" Image

The first resumed training (redline) trains for 3 epochs and finishes at total of 2+3=5 epochs as expected. The resumed-resumed training (yellowline) trains for 4 epochs resulting in of 2+1+4=7 epochs of training.

This may be as simple as the "current_step" being saved with wrong number in train_state.json. But I am not good enough to know if that is the problem.

Used training settings: training_tomls.zip

Hirmuolio avatar Aug 10 '25 16:08 Hirmuolio