mag
mag copied to clipboard
Add option to split SPAdes read correction into separate process or enable SPAdes checkpoints
Description of feature
When running metaSPAdes
as part of nf-core/mag, the first step is the read correction followed by the actual assembly steps. When using the sensible default resource settings of nf-core/mag to run SPAdes, SPAdes
might run out of memory for large samples with a lot of sequencing data. Upon re-starting the step, SPAdes
will then start from scratch and first perform the read correction again, even if this was successful in the previous attempt.
The read correction step is rather time consuming and can take more than 15 hours for samples with more than 100 million reads. However, it often has slightly lower memory requirements than the actual assembly steps. Restarting with read correction each time SPAdes
failed due to low memory in the assembly step seems to me a waste of resources and computing time. The same is true to just run all samples with high memory requirements by default.
There are two possible solutions to avoid this dilemma:
-
SPAdes
allows to restart from checkpoints, i.e. the last completed step, and therefore would not re-run read correction, if this step finished successfully in a previous attempt. However, despite my limited knowledge of Nextflow I assume this might be tricky given that a new temporary folder is created for each process. - The process
SPAdes
is split intoSPADES_READCORRECTION
andSPADES_ASSEMBLY
.SPADES_ASSEMBLY
would still run of from the files produced bySPADES_READCORRECTION
but it would avoid rerunning the read corrections in case the assembly step fails.