mag
mag copied to clipboard
Add option to split SPAdes read correction into separate process or enable SPAdes checkpoints
Description of feature
When running metaSPAdes as part of nf-core/mag, the first step is the read correction followed by the actual assembly steps. When using the sensible default resource settings of nf-core/mag to run SPAdes, SPAdes might run out of memory for large samples with a lot of sequencing data. Upon re-starting the step, SPAdes will then start from scratch and first perform the read correction again, even if this was successful in the previous attempt.
The read correction step is rather time consuming and can take more than 15 hours for samples with more than 100 million reads. However, it often has slightly lower memory requirements than the actual assembly steps. Restarting with read correction each time SPAdes failed due to low memory in the assembly step seems to me a waste of resources and computing time. The same is true to just run all samples with high memory requirements by default.
There are two possible solutions to avoid this dilemma:
SPAdesallows to restart from checkpoints, i.e. the last completed step, and therefore would not re-run read correction, if this step finished successfully in a previous attempt. However, despite my limited knowledge of Nextflow I assume this might be tricky given that a new temporary folder is created for each process.- The process
SPAdesis split intoSPADES_READCORRECTIONandSPADES_ASSEMBLY.SPADES_ASSEMBLYwould still run of from the files produced bySPADES_READCORRECTIONbut it would avoid rerunning the read corrections in case the assembly step fails.