OpusCleaner icon indicating copy to clipboard operation
OpusCleaner copied to clipboard

Partial execution of filter pipeline

Open jelmervdl opened this issue 2 years ago • 0 comments

In the category lessons from Paracrawl:

It is sometimes very useful to split the actual filtering pipeline into a couple of steps that are then executed on different hardware. For example, bicleaner-ai and LASER benefit a lot from having access to GPUs. Other steps might not. If we're running all the CPU heavy steps on GPU nodes then that's a waste of GPU budget.

I think we'd need a couple of things for this:

  1. Option in run.py to execute a selection (or span) of steps,
  2. run.py able to resume from stdin/disk and write partially filtered output to disk. (not much needs to be done for this I think)
  3. Add hints to the filters about the resources they'd like so we can match them up better.
  4. Add some sort of hints to the UI to group filters with the same resource requirements together? Changing the order of the filters has impact on their performance so we can't do this automatically. But novices won't know about the performance of filters.
  5. I'm thinking doing it based on tags or something. Not on like the index of the filter step because the number of filters might vary across different datasets.

jelmervdl avatar Oct 06 '22 13:10 jelmervdl