OpusCleaner icon indicating copy to clipboard operation
OpusCleaner copied to clipboard

Workflow

Open jelmervdl opened this issue 2 years ago • 1 comments

Sorry bad title need to jot down some notes.

Empty-train workflow, long version (maybe you can skip steps?)

  1. Select datasets
  2. Download each dataset
  3. Generate samples
  4. Select filters for each dataset
  5. Select a category for each dataset
  6. Run filters on each dataset (highly parallel)
  7. Combine and deduplicate datasets (parallel per category, maybe, #41)
  8. create trainer.py configuration using categories and deduplicated files from previous step
  9. Generate alignments for placeholders, for training guided alignment(?) (parallel)
  10. Run trainer.py to train model

We need at least a workflow manager to manage 4..7, maybe 8.

jelmervdl avatar Jan 10 '23 16:01 jelmervdl