firefox-translations-training icon indicating copy to clipboard operation
firefox-translations-training copied to clipboard

Publish training charts from Taskcluster

Open eu9ene opened this issue 1 year ago • 4 comments

This includes publishing:

  • live training logs to W&B dashboards

I assume we'll have separate publishing scripts for other things.

Let's use Taskgraph transforms not to pollute Taskcluster kinds with even more logic.

eu9ene avatar Jan 03 '24 19:01 eu9ene

We can start working on this now.

I identified these next steps to discuss together:

  1. build a dedicated command in the tracking package to support specific tasks (training & evaluation) and use a Taskcluster secret to get the Weight & Biases token
  2. Setup the publication code through a Taskcluster transform
  3. Patch training shell scripts to capture output and publish from logs
    • when an env variable is set
    • scripts to patch
      • pipeline/train/train.sh
      • pipeline/train/spm-vocab.sh
  4. Identify other tasks to track & publish

La0 avatar Mar 11 '24 14:03 La0

After discussion with Evgeny, we can start to build the dedicated script to interact with Taskcluster secret & logs.

La0 avatar Mar 13 '24 19:03 La0

One small correction: pipeline/train/spm-vocab.sh trains a vocab and we don't want to track this in W&B. So only train.sh should be patched.

eu9ene avatar Mar 13 '24 20:03 eu9ene

@evabardou will work on Taskcluster secret support.

We can start experimenting log parsing from CI using --in-stream option too

La0 avatar Mar 18 '24 15:03 La0