students icon indicating copy to clipboard operation
students copied to clipboard

Monitor pl-en training

Open kpu opened this issue 2 years ago • 6 comments

Running in screen 86852 on the second half of alvis. It's currently in student training.

kpu avatar Mar 15 '22 22:03 kpu

Seems to have finished, but I'm a little concerned about the backtranslations that apparently were used to train the teachers.

ermann@alvis:~$ zcat /fs/surtr0/nbogoych/data/data/pl-en/pl-en-prod/translated/mono.pl.gz | head 
Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale
Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale Ale

ugermann avatar Mar 20 '22 21:03 ugermann

Oh my that is terrible. The whole file is Ale with various lengths. @eu9ene the pipeline shouldn't be continuing if quality is that terrible.

kpu avatar Mar 20 '22 21:03 kpu

I have seen such an issue before, it's fp16 mode being broken for some reason. Disable fp16 mode in the configuration file for plen and, sigh, delete the translated files.... I may try to do this tmrw.

XapaJIaMnu avatar Mar 20 '22 22:03 XapaJIaMnu

I copied the whole stuff to /fs/surtr/germann/bergamot/train/plen and am currently re-running things from backtranslation onwards on alvis.

ugermann avatar Mar 20 '22 23:03 ugermann

I also discovered this issue while retraining my models. It was fixed a month ago https://github.com/mozilla/firefox-translations-training/blob/22a3751a09dfdb2ba52f4d08c285e424c533dcde/configs/config.prod.yml#L64

I created an issue https://github.com/mozilla/firefox-translations-training/issues/78 to stop training if the quality is too low.

eu9ene avatar Mar 21 '22 18:03 eu9ene

There is also another issue https://github.com/mozilla/firefox-translations-training/issues/75. I don't know how your training is configured, but when I was retraining ru,pt,it -> en models, the quality of neither fine-tuned teacher was improved on top of the pretrained teacher, and even worse, it degraded and those models were used further in the pipeline. I tried to increase patience but it didn't help. We should investigate this.

eu9ene avatar Mar 21 '22 18:03 eu9ene