firefox-translations-training
firefox-translations-training copied to clipboard
Teacher does not continue training after pretraining on augmented corpus
I continue testing the pipeline and I see that almost all teacher models don't continue training even after I increased patience by setting early-stopping: 20
.
Currently, continuation happens by training new models on a parallel corpus using --pretrained-model
and model.npz.best-chrf.npz
of the teacher that was pre-trained on an augmented corpus for 2 epochs.
Also, I see that somehow quality of every continuation model is a bit worse than for pre-trained model, and we use the continuation model for translation further in the pipeline.
I went with this approach after having constant workflow issues with continuation in the same folder.
It seems this is not correct. Maybe we should copy model.npz.optimizer.npz
or an entire directory instead of using the --pretrained-model
flag? @kpu @XapaJIaMnu
Increasing early stopping thresholds can help, but it still does not properly fine tune on some languages, I assume because of low quality of the data.
training-teacher-base:
# remove for low resource languages or if training without augmentation
after: 2e
early-stopping: 20
training-teacher-finetuned:
early-stopping: 40
@eu9ene Is this bug actionable? Should we close without more specific things to focus on?
It's essentially the same as https://github.com/mozilla/firefox-translations-training/issues/472. Now we train everything in one run with OpusTrainer and don't have a problem using the worse fine-tuned model since the pre-trained checkpoint will be used if it doesn't continue training.