Resume continue the pipeline
Hello! I used Dolma pipeline with Slurm Workload Manager and it has timelimit
Is it possible to add resume continue the pipeline without rerun all pipeline?
Have a look at the ignore_existing and metadata_prefix options, they should work for tagging, converting... You can set both from the config file.
When ignore_existing: false, the processor will look inside metadata directory whether a file has already been processed and will skip it. You should make sure to set metadata_prefix to a fixed path from inside your config file, otherwise it's set to a different temporary directory each time your script runs (thus ignore_existing will not have any effect).
Hi! Thanks for the question. We’re currently working on closing out old tickets, and we apologize that we didn’t get to you in a timely fashion. We’re closing this out for now, but if you’d still like an answer, please re-open and we will get back to you!