firefox-translations-training
firefox-translations-training copied to clipboard
Tune workspace dashboard to enable comparison across models and experiments
We have a great bar chart to compare across the models for one experiment (runs group). We should figure out how to tune this dashboard or rename the steps so that we can also compare metrics as well as charts across the experiments. The expectation was that this tool will replace Tensorboard which does a good job visualizing lines from different runs on one dashboard.
If W&B doesn't allow this, the workaround would be to rename runs with a suffix with experiment name so that they are unique. For example student-finetuned_opustrainer
.
We discussed that W&B uses display names to show charts and since those names are not unique it doesn't show the runs properly on the charts. We discussed the following potential solutions:
- We should figure out if there is a workaround on the W&B side, like using an internal ID that also includes a group name.
- Add a short suffix to each run name composed of a hash of the group name. We don't have a lot of groups per language, so the chance of a collision is pretty low and we can shorten the suffix. For example:
teacher-1_s7g3j
. - Just use a counter for the groups per project and add a suffix with the current value. For example
teacher-1_13
. We would need to load the project groups to calculate the value each time we need to publish something. There might be race conditions possible in this case when we run multiple tasks at the same time, for example, if we run many evaluations all at once as the only tasks in the group. So the hashing approach seems to be more robust.
If we use a unique name, it could also be used as the ID for the run (making #610 trivial, which simplifies all the "resume" code).