aigt icon indicating copy to clipboard operation
aigt copied to clipboard

Automatically save best model during training

Open chriscyyeung opened this issue 2 years ago • 1 comments

I think instead of (or maybe in addition to) saving after a certain number of epochs, we can save the best model based on the validation loss?

chriscyyeung avatar May 11 '23 19:05 chriscyyeung

Finding the best model is not as simple as saving the one that has minimum loss function. Sometimes when training for extremely long (thousands of epochs), the model can learn a better (more general) representation of the data without decreasing the loss function. I couldn't find where I learned this, but it was related to this theory: https://medium.com/@MITIBMLab/estimating-information-flow-in-deep-neural-networks-b2a77bdda7a7

Saving the model regularly is generally a good practice. We could add an option like "model_save_frequency". E.g. if it's 5 then the model would be saved after every 5 epochs using names like model_005, model_010, etc. And we could save on the wandb report all the metrics for each saved model.

I also had positive experience in the past training for a few hundred more epochs after it seemed like the metrics did not improve.

ungi avatar May 11 '23 22:05 ungi