firefox-translations-training Improve usability of running selected tasks

Improve usability of running selected tasks

Open eu9ene opened this issue 7 months ago • 3 comments

While working on the big training and bug fixes I ran into many issues with scheduling specific tasks. Basically, the graph and caches can be in an arbitrary state and we still should be able to run the pipeline starting with specific stages and reusing the stages that ran before.

We currently have several tools to work with:

target-stage
start_stage and previous_group_ids
existing_tasks
pre-trained models
Git branches
Adding extra tasks to the graph

Usually, I see there's an issue when it starts scheduling tasks I don't need to schedule and I try to come up with a workaround using those tools. Also, using the current tools adds a significant mental load and is hard to use when training 10s of languages with fixes at the same time (see this PR). We should rethink this approach to make it more flexible and easy to use.

Maybe introducing a concept of state similar to the data on disk in Snakemake can help here. There was an option to skip smart scheduling based on file creation dates and information about the past runs and just treat everything present on disk as completed tasks and schedule the rest.

Jul 02 '24 18:07 eu9ene

firefox-translations-training firefox-translations-training copied to clipboard

Improve usability of running selected tasks

firefox-translations-training
firefox-translations-training copied to clipboard