methylseq icon indicating copy to clipboard operation
methylseq copied to clipboard

Add ability to merge FastQs

Open bounlu opened this issue 3 years ago • 3 comments

It is desired to merge files from multiple lanes belonging to the same sample. Often the samples are sequenced in multiple lanes and this indeed speeds up the alignment step, but they need to be merged at the bam level.

I think it will be very useful to add this module just like in the sarek pipeline: https://github.com/nf-core/sarek/tree/dev/modules/nf-core/modules/samtools/merge

bounlu avatar Jun 29 '22 06:06 bounlu

I believe that this is already implemented in the dsl2 branch and will be included in the next release.

ewels avatar Jun 29 '22 07:06 ewels

I don't see it in dsl2 branch yet: https://github.com/nf-core/methylseq/tree/dsl2/modules/nf-core/modules/samtools

bounlu avatar Jun 29 '22 07:06 bounlu

You're right, apologies. I know we've talked about adding it previously so it's definitely on the roadmap.

Note that we usually just use cat rather than samtools in other pipelines: https://nf-co.re/modules/cat_fastq

ewels avatar Jun 29 '22 12:06 ewels

Example of lane merging in the nf-core/taxprofiler pipeline here. (Thanks for the pointer @Midnighter 👍🏻 )

ewels avatar Nov 08 '22 15:11 ewels

Another example from the rnaseq pipeline here.

ewels avatar Nov 08 '22 15:11 ewels

NB: Neither of these pipelines have the same check_samplesheet.py script in their bin directories ⚠️ so need to check channel structure etc. (and _T{n} suffix trimming)

ewels avatar Nov 08 '22 15:11 ewels

NB: The documentation actually says that you can already do this 🙈 but it's not true.. https://nf-co.re/methylseq/dev/usage#multiple-runs-of-the-same-sample

So we really need to implement this ASAP.

ewels avatar Nov 08 '22 16:11 ewels

Should these be merged at the fastQ level after INPUT_CHECK using CAT_FASTQ (like the rnaseq pipeline) or should we merge it at the .bam level after ALIGNER using SAMTOOLS_MERGE if alignment is faster before combining them?

SpikyClip avatar Nov 16 '22 01:11 SpikyClip

Done in https://github.com/nf-core/methylseq/pull/272

ewels avatar Nov 28 '22 23:11 ewels

NB: The documentation actually says that you can already do this 🙈 but it's not true.. https://nf-co.re/methylseq/dev/usage#multiple-runs-of-the-same-sample

So we really need to implement this ASAP.

Is this now available in the regular pipeline version? (not the development one). Thanks.

paulicm-UCD avatar Jan 12 '24 17:01 paulicm-UCD

Yes, the PR that was merged has been present in v2.2.0 onwards 👍🏻

ewels avatar Jan 12 '24 23:01 ewels