firefox-translations-training icon indicating copy to clipboard operation
firefox-translations-training copied to clipboard

Investigate optimizing the CI training run

Open gregtatum opened this issue 1 year ago • 2 comments

It would be nice to optimize the end to end time. Here it is 1 hour and 25 minutes https://share.firefox.dev/3I192Z3

The training steps are the ones that take the longest time. We could try using less data, fewer epochs of training, and smaller model sizes.

Step Time
Teacher 12:55
Student 12:43
Finetune Student 13:41

@eu9ene you mentioned something about quantization failures in a meeting when we discussed this, can you elaborate on that?

gregtatum avatar Feb 21 '24 20:02 gregtatum

It's basically this: https://github.com/mozilla/firefox-translations-training/issues/455

you mentioned something about quantization failures in a meeting when we discussed this, can you elaborate on that?

I'm not sure, we should investigate

eu9ene avatar Feb 26 '24 17:02 eu9ene

Something happens on [taskcluster-proxy] Successfully refreshed taskcluster-proxy credentials:. I see two 20 minutes gaps when this line appears: https://firefox-ci-tc.services.mozilla.com/tasks/TnmPTMeqSPWB727etTAAaw/runs/1/logs/live/public/logs/live.log

eu9ene avatar May 22 '24 18:05 eu9ene