firefox-translations-training icon indicating copy to clipboard operation
firefox-translations-training copied to clipboard

Improve usability of running selected tasks

Open eu9ene opened this issue 7 months ago • 3 comments

While working on the big training and bug fixes I ran into many issues with scheduling specific tasks. Basically, the graph and caches can be in an arbitrary state and we still should be able to run the pipeline starting with specific stages and reusing the stages that ran before.

We currently have several tools to work with:

  1. target-stage
  2. start_stage and previous_group_ids
  3. existing_tasks
  4. pre-trained models
  5. Git branches
  6. Adding extra tasks to the graph

Usually, I see there's an issue when it starts scheduling tasks I don't need to schedule and I try to come up with a workaround using those tools. Also, using the current tools adds a significant mental load and is hard to use when training 10s of languages with fixes at the same time (see this PR). We should rethink this approach to make it more flexible and easy to use.

Maybe introducing a concept of state similar to the data on disk in Snakemake can help here. There was an option to skip smart scheduling based on file creation dates and information about the past runs and just treat everything present on disk as completed tasks and schedule the rest.

eu9ene avatar Jul 02 '24 18:07 eu9ene