BLINK icon indicating copy to clipboard operation
BLINK copied to clipboard

Bug report regarding continue training from an epoch

Open Los-Phoenix opened this issue 3 years ago • 0 comments

We managed to train elq on our own dataset. When we tried to continue training from a certain epoch with the same training data (to save time), the model seemed to stop advancing. The loss drops at a very small rate, and the p-r-f scores stops changing.

The only change we did to the code and scripts is assigned ${12}(epoch) for train_elq.sh.

I thought the model should proceed as if it was never stopped, or at least continue advancing not as well for a learning rate change. But it stopped completely. There must be some bug, probably with learning rate here.

Los-Phoenix avatar Sep 11 '21 14:09 Los-Phoenix