kge
kge copied to clipboard
Saving checkpoint before evaluation
Currently we are storing checkpoints after evaluation. If we for some reason encounter an error during evaluation (e.g. OOM) we will lose the complete epoch. Therefore, we should store the checkpoint before (or even while) we run the evaluation code.
I think this problem can be avoided by evaluating the model before the training phase as saving the checkpoints after the evaluation can provide the best model and well support the early stopping.