firefox-translations-training icon indicating copy to clipboard operation
firefox-translations-training copied to clipboard

Teacher does not continue training after pretraining on augmented corpus

Open eu9ene opened this issue 2 years ago • 1 comments

I continue testing the pipeline and I see that almost all teacher models don't continue training even after I increased patience by setting early-stopping: 20.

Currently, continuation happens by training new models on a parallel corpus using --pretrained-model and model.npz.best-chrf.npz of the teacher that was pre-trained on an augmented corpus for 2 epochs.

Also, I see that somehow quality of every continuation model is a bit worse than for pre-trained model, and we use the continuation model for translation further in the pipeline.

I went with this approach after having constant workflow issues with continuation in the same folder. It seems this is not correct. Maybe we should copy model.npz.optimizer.npz or an entire directory instead of using the --pretrained-model flag? @kpu @XapaJIaMnu

eu9ene avatar Feb 25 '22 21:02 eu9ene

Increasing early stopping thresholds can help, but it still does not properly fine tune on some languages, I assume because of low quality of the data.

  training-teacher-base:
    # remove for low resource languages or if training without augmentation
    after: 2e
    early-stopping: 20
  training-teacher-finetuned:
    early-stopping: 40

eu9ene avatar Jun 09 '22 18:06 eu9ene

@eu9ene Is this bug actionable? Should we close without more specific things to focus on?

gregtatum avatar Apr 09 '24 21:04 gregtatum

It's essentially the same as https://github.com/mozilla/firefox-translations-training/issues/472. Now we train everything in one run with OpusTrainer and don't have a problem using the worse fine-tuned model since the pre-trained checkpoint will be used if it doesn't continue training.

eu9ene avatar Apr 09 '24 21:04 eu9ene