Add ability to merge FastQs
It is desired to merge files from multiple lanes belonging to the same sample. Often the samples are sequenced in multiple lanes and this indeed speeds up the alignment step, but they need to be merged at the bam level.
I think it will be very useful to add this module just like in the sarek pipeline:
https://github.com/nf-core/sarek/tree/dev/modules/nf-core/modules/samtools/merge
I believe that this is already implemented in the dsl2 branch and will be included in the next release.
I don't see it in dsl2 branch yet:
https://github.com/nf-core/methylseq/tree/dsl2/modules/nf-core/modules/samtools
You're right, apologies. I know we've talked about adding it previously so it's definitely on the roadmap.
Note that we usually just use cat rather than samtools in other pipelines: https://nf-co.re/modules/cat_fastq
Example of lane merging in the nf-core/taxprofiler pipeline here. (Thanks for the pointer @Midnighter 👍🏻 )
Another example from the rnaseq pipeline here.
NB: Neither of these pipelines have the same check_samplesheet.py script in their bin directories ⚠️ so need to check channel structure etc. (and _T{n} suffix trimming)
NB: The documentation actually says that you can already do this 🙈 but it's not true.. https://nf-co.re/methylseq/dev/usage#multiple-runs-of-the-same-sample
So we really need to implement this ASAP.
Should these be merged at the fastQ level after INPUT_CHECK using CAT_FASTQ (like the rnaseq pipeline) or should we merge it at the .bam level after ALIGNER using SAMTOOLS_MERGE if alignment is faster before combining them?
Done in https://github.com/nf-core/methylseq/pull/272
NB: The documentation actually says that you can already do this 🙈 but it's not true.. https://nf-co.re/methylseq/dev/usage#multiple-runs-of-the-same-sample
So we really need to implement this ASAP.
Is this now available in the regular pipeline version? (not the development one). Thanks.