keras-nlp
keras-nlp copied to clipboard
[Training] During checkpoint restore, move the dataset iterator to the correct spot
Currently we are using the BackupAndRestore callback to resume training on our examples after a failure. We also need to make sure that we reset the dataset iterator to the correct spot, so we do not accidentally overtrain on the beginning of our dataset for a job that is killed frequently.