rnaseq
rnaseq copied to clipboard
Make pipeline truly resumable
Description of feature
When the pipeline has finished once with success and re-running it with -resume, it does not truly resume. I'd expect that the pipeline recognizes that all steps have run with success already and takes caches results without doing any computations. But, in reality somehow mapping indices from Salmon (and maybe STAR) when running in salmon_star mode are not saved. This has the consequence that these indices will have to be re-build with -resume which has the further consequence that all downstream steps (mapping, etc) will be re-run because the upstream step of index construction was re-run. This may waste computational resources and users have to wait longer for results to be re-computed that were already computed before with success.
That is not really what the -resume
functionality is meant for. It enables fixing issues along the way that have caused a workflow execution to grind to a halt. It is not meant to preserve computation results/assets for entirely different runs once the execution finished successfully.
Please use the --save-references
parameter and specify the resulting files as input parameters (--star_index
, --salmon_index
) for subsequent runs.
Agree. Nextflow is inherently only resumable at the run-level and not across multiple runs. If you want to store assets more longer term then most pipelines will have parameters like --save_reference
to store these files in more permanent storage for re-use.