Bismark icon indicating copy to clipboard operation
Bismark copied to clipboard

Disabled --no-discordant for structural variations

Open MimoriK opened this issue 2 years ago • 1 comments

Hi Felix,

Alignments should not be really discordant unless something weird happened during the library preparation, or possibly if you have structural variation or translocations in your genome. We try to focus on the less weird parts of the genome, which is why --no-mixed and --no-discordant are on by default.

I tried to use bismark align fastq files with structural variations in bs-seq, with the following command: bismark --genome_folder ~/RefGenome --parallel 6 -I 0 -X 1000 --local --unmapped --ambiguous --ambig_bam --bowtie2 -1 R1.fq.gz -2 R2.fq.gz --temp_dir ./tmp --output_dir ./bismark

Although soft-clipped reads exist in the output bam, they are concordant reads for paired-end. I know that calling structural variations in methylation is difficult, and discordant reads always have multiple alignments with low mapping quality, but we are trying to develop such callers, so could bismark provide discordant reads as output?

Thank you in advance, Ruth

MimoriK avatar Dec 13 '22 03:12 MimoriK

Hi Ruth,

the calls --no-mixed and --no-discordant are hard-coded into the calls to Bowtie2 (or other aligners), and I am afraid the entire logic of processing paired-end reads relies on the fact that only concordant reads getting reported. Changing this logic to deal well with non-discordant reads would require a substantial re-write, which I am afraid I won't have time to do at the moment.

As something of a workaround that might also work for you could be to run a command like the one above, specifying --unmapped. This would essentially align all concordant read pairs to the genome, thereby depleting your libraries of the more 'boring' reference-genome-like reads.

You could then align the remaining, unmapped FastQ files in single end mode (default alignments for unmapped_R1, and --pbat alignments for unmapped_R2) and attempt to pair them back up afterwards, in something we called Hi-C-like approach (see aslo this blog post: https://sequencing.qcfail.com/articles/pbat-libraries-may-generate-chimaeric-read-pairs/)

FelixKrueger avatar Dec 13 '22 12:12 FelixKrueger

Thanks for your reply!

I tried the above method, and get more discordant reads. bismark mapping is accurate and stable. In single-end mode, bismark provides optimal alignments for unmapped reads in paired-end mode. Excellent!

Another question: Will bismark plan to provide supplementary alignment for soft-clipped reads, like the SA tag, in the future? I didn't find any multiple alignments in bam.

MimoriK avatar Dec 15 '22 03:12 MimoriK

Regarding your last question, I am afraid there are currently no plans to extend alignments to supplementary alignments.

I am glad however that the method above proved useful for your purposes!

FelixKrueger avatar Dec 15 '22 17:12 FelixKrueger