data icon indicating copy to clipboard operation
data copied to clipboard

fault tolerant training with dataloader

Open yuvalkirstain opened this issue 1 year ago • 2 comments

🚀 The feature, motivation and pitch

When training with a dataloader, we might stop training in the middle of the run, and start it again later. Then, we usually want to start when we finished last time. Currently, I need to go over all of the data I went over in the previous run, which can be very slow. It would be nice to change it.

Alternatives

If we could save the state of the dataloader, and then load it, or if we could have a skipping mode that returns dummy batches until we get to the step we stopped in the previous run, it will probably lead to much faster results, when continuing a run that was stopped.

Additional context

No response

yuvalkirstain avatar Mar 26 '23 07:03 yuvalkirstain