mag icon indicating copy to clipboard operation
mag copied to clipboard

Remove inefficient basic file handling pipeline steps

Open adamrtalbot opened this issue 2 years ago • 0 comments

Description of feature

The following processes do nothing but unpack tars or handle files. We can remove some or all of them to make the pipeline much more efficient:

  • ADJUST_MAXBIN2_EXT
  • BUSCO_DB_PREPARATION
  • BUSCO_SAVE_DOWNLOAD
  • CAT_DB_GENERATE
  • CAT_DB
  • CAT_SUMMARY
  • CENTRIFUGE_DB_PREPARATION
  • GTDBTK_DB_PREPARATION
  • KRAKEN2_DB_PREPARATION
  • KRONA_DB
  • POOL_PAIRED_READS
  • POOL_SINGLE_READS
  • RENAME_POSTDASTOOL
  • RENAME_PREDASTOOL

All of these steps can be handled with Nextflow. They may require upstream changes in nf-core/modules.

The following processes could be simplified, altered or perhaps removed but will have to be checked on a case-by-case basis:

  • CAT
  • COMBINE_TSV
  • CONVERT_DEPTHS
  • QUAST_BINS_SUMMARY
  • SPADESHYBRID
  • TIARA_CLASSIFY

Related issues:

  • https://github.com/nf-core/mag/issues/502
  • https://github.com/nf-core/mag/issues/474
  • https://github.com/nf-core/mag/issues/462

adamrtalbot avatar Sep 07 '23 09:09 adamrtalbot