keras-nlp icon indicating copy to clipboard operation
keras-nlp copied to clipboard

[Training] During checkpoint restore, move the dataset iterator to the correct spot

Open mattdangerw opened this issue 3 years ago • 0 comments

Currently we are using the BackupAndRestore callback to resume training on our examples after a failure. We also need to make sure that we reset the dataset iterator to the correct spot, so we do not accidentally overtrain on the beginning of our dataset for a job that is killed frequently.

mattdangerw avatar May 24 '22 18:05 mattdangerw