metaerg
metaerg copied to clipboard
Best practices when rerunning after job fails
I am loving MetaErg. It makes annotations much easier. Thank you!
Many of my jobs had to be canceled while running, unfortunately. Nothing to do with MetaErg.
I know you mentioned that we can rerun with the same parameters and it will continue where it left off, thanks to the tmp
folder. However, it seems that we have to add the --force parameter to work. Is there anything else we should consider? Should we delete the last files if the file was not finished writing? Or do all of the scripts rewrite the folder and not append to it? Please share the best practices when rerunning after a failed job. Thank you!
When using the --force
option, most scripts look for an exisiting output file and use that instead of running the tool again. This can lead to problems with tools like NHMMER, which write partial result files. Got one of my SGE jobs killed during a unplanned reboot of that node and this lead to an invalid and incomplete output file. If have kept the output log, you might be able to find the files the workflow was working on at the time and just replace them, but the safer option would be to rerun the workflow.
May you please list all of the tools in the workflow that write partial results? This will help me to know whether or not to delete the output file. I very much do not want to restart the workflow. Hopefully, this issue can be resolved in the next version of MetaErg?
Sadly, I am not a developer of MetaErg and I don't know what the future of this project is. The caching behavior might even be different for different tool versions and is undocumented most of the time, but it usually just depends on the used IO library. Snakemake deals with this problem by having flag files that are written when a job finishes/fails by using &&
and ||
. This could be added to the command calls and the check for the output file could be replaced with a flag check for the successful run. I will write a patch/PR, because as I am writing this, I see how much I want this for myself as well.
Snakemake is a great option. A recent metagenomics pipeline uses Snakemake and it works wonders for me. https://sunbeam.readthedocs.io/en/latest/usage.html
Also, Snakmake makes it easier for me to modify parameters and select which programs to run.