MatchZoo-py icon indicating copy to clipboard operation
MatchZoo-py copied to clipboard

Use of trainer/runer and number of training epochs

Open littlewine opened this issue 4 years ago • 1 comments

Hi, I have a question regarding choosing the epochs and doing hyperparameter tuning in general.

I am currently using matchzoo.trainers.trainer to train my models with the default number of epochs(=10).

Does this always end training in epoch=10, or it keeps some sort of checkpoints and then restores the checkpoint/model in the epoch were the validation result is best? This is not very clear to me from the documentation, and there's a lot of confusion given that there are different tutorials/documentations in matchzoo and matchzoo-py.

Apart from that, my question is:

  • If training stops always on the 10th epoch, how can I make it stop and restore the model that achieves the best results based on a metric from the validation score? Ideally, I would like to do this with checkpoints, rather than using matchzoo.auto.tuner.tuner and re-training the model over and over, or some sort of other hacky solution. I guess there should be already something in place to do that.

  • If the trainer indeed restores the checkpoint with the highest score, after the 10 epochs are finished running: Which metric is used to determine the highest score? Is it just the first metric in the list of task.metrics?

Thank you for your help!

littlewine avatar May 07 '20 14:05 littlewine

@littlewine have you addressed this issue? In fact, the epoch number to save the checkpoints could be set in advance.

faneshion avatar Sep 20 '20 07:09 faneshion