AtacWorks
AtacWorks copied to clipboard
Fix resume functionality in training.
Currently resume only reads the weights from an existing model and starts training with them instead of randomly initialized weights. This is only partially correct. Ideally, we would save all the states of training in a checkpoint file and resume training exactly where we left off (including lr, random seed, same batch etc)