mag
mag copied to clipboard
Remove inefficient basic file handling pipeline steps
Description of feature
The following processes do nothing but unpack tars or handle files. We can remove some or all of them to make the pipeline much more efficient:
- ADJUST_MAXBIN2_EXT
- BUSCO_DB_PREPARATION
- BUSCO_SAVE_DOWNLOAD
- CAT_DB_GENERATE
- CAT_DB
- CAT_SUMMARY
- CENTRIFUGE_DB_PREPARATION
- GTDBTK_DB_PREPARATION
- KRAKEN2_DB_PREPARATION
- KRONA_DB
- POOL_PAIRED_READS
- POOL_SINGLE_READS
- RENAME_POSTDASTOOL
- RENAME_PREDASTOOL
All of these steps can be handled with Nextflow. They may require upstream changes in nf-core/modules.
The following processes could be simplified, altered or perhaps removed but will have to be checked on a case-by-case basis:
- CAT
- COMBINE_TSV
- CONVERT_DEPTHS
- QUAST_BINS_SUMMARY
- SPADESHYBRID
- TIARA_CLASSIFY
Related issues:
- https://github.com/nf-core/mag/issues/502
- https://github.com/nf-core/mag/issues/474
- https://github.com/nf-core/mag/issues/462