methylseq icon indicating copy to clipboard operation
methylseq copied to clipboard

`samtools sort` overwrites

Open bounlu opened this issue 2 years ago • 0 comments

I want to extract deduplicated and sorted bam file from the pipeline by using the Bismark aligner.

When I use --save_align_intermeds flag, SAMTOOLS_SORT module writes the sorted bam file to the output directory as per this config:

    withName: SAMTOOLS_SORT {
        ext.prefix = { "${meta.id}.sorted" }
        publishDir = [
            [
                path: { "${params.outdir}/${params.aligner}/deduplicated/" },
                mode: params.publish_dir_mode,
                pattern: "*markdup*.bam",
                enabled: params.save_align_intermeds
            ],
            [
                path: { "${params.outdir}/${params.aligner}/alignments/" },
                mode: params.publish_dir_mode,
                pattern: "*.bam",
                enabled: params.save_align_intermeds
            ]
        ]
    }

However, SAMTOOLS_SORT is used twice with alias in the BISMARK workflow, one is before and one is after deduplication. The sorted file before deduplication is saved, then the sorted file after deduplication is overwritten. This is because both files use identical filename after sorting due to the same prefix. This redundancy should be avoided by adding .deduplicated.sorted. for the latter.

Moreover, the overwritten file is written to the wrong folder /alignments/ instead of /deduplicated/.

bounlu avatar Jun 21 '23 07:06 bounlu