Evgeny Pavlov
Evgeny Pavlov
If W&B doesn't allow this, the workaround would be to rename runs with a suffix with experiment name so that they are unique. For example `student-finetuned_opustrainer`.
We discussed that W&B uses display names to show charts and since those names are not unique it doesn't show the runs properly on the charts. We discussed the following...
We don't have any issues with our current workflows. The supported Python versions are specified in poetry config and Docker images: https://github.com/mozilla/firefox-translations-training/blob/04e9e9cdc369cc8efdf080d57eef805a61d2c35e/pyproject.toml#L9
Those setting would also be visible in the experiment config and simplify analysis of the experiments.
@marco-c FYI since you already started some work on this. I wanted to run an evaluation using our tools at some point.
> See also https://arxiv.org/pdf/2302.14520.pdf. This one is on my list :) ["A PARADIGM SHIFT IN MACHINE TRANSLATION: BOOSTING TRANSLATION PERFORMANCE OF LARGE LANGUAGE MODELS"](https://arxiv.org/pdf/2309.11674.pdf) is another interesting one
WMT23: https://aclanthology.org/2023.wmt-1.1.pdf
I analyzed the results of https://arxiv.org/pdf/2302.09210.pdf and https://arxiv.org/pdf/2309.11674.pdf and also benchmarked [ALMA-13B-LoRA ](https://huggingface.co/haoranxu/ALMA-13B)myself. The quality is pretty good and looks on par with Google API for xx-en and slightly worse...
Sure, [here](https://docs.google.com/spreadsheets/d/1O77Ap0zA5xMw0gbzfLBDPLKaM2zAAVgFeN8TF227fbg/edit?usp=sharing) it is but that's basically it. I wanted to also benchmark for WM23 but didn't have time for it. For the out-of-the-envelope calculations: for [this mono task](https://firefox-ci-tc.services.mozilla.com/tasks/K1wuw2XQRPu4SfwJyWf5mQ/runs/0/logs/public/logs/live.log) we...
The empty cells for some languages are where the model failed to follow the prompt and translate all the examples, so like with all other LLM tasks it's not 100%...