duphold icon indicating copy to clipboard operation
duphold copied to clipboard

How is DHSP computed?

Open raul-w opened this issue 5 years ago • 2 comments

Hi Brent,

How is the count of spanning read-pairs (DHSP) defined in duphold? Is it the number of discordantly aligned read pairs that flank a deletion event? If so, I find it odd that I generally observe that deletions attain a DHSP of 0, while discordantly aligned read pairs were one of the signals used to call them. Furthermore, IGV clearly shows that several of these deletions are flanked by read pairs with a significantly larger insert size than expected.

Thanks for your time.

raul-w avatar Nov 30 '18 09:11 raul-w

the DHSP should be a conservative (but fairly accurate) count of discordant reads that support the event. if you have an example where that's not the case, I'll take a look.

brentp avatar Nov 30 '18 21:11 brentp

Hi Brent,

Here is a small test case: https://drive.google.com/file/d/1cIRuXAZC2kN15K__UPxaCAzRIs6t1n-h/view?usp=sharing

The output file (test_output.vcf) was produced by duphold v.0.1.1. The command used to produce this file was:

duphold -t 4 -v test.vcf -b test.bam -f test.fa -o test_output.vcf

The output file contains several deletion events that were annotated with DHSP values of 0, but are clearly flanked by discordantly aligned read pairs, when looking at the alignments stored in test.discordants.bam (produced by the speedseq align command). A clear example is the deletion event that covers the region 28773-37022 on SL3.0ch02:

region_many_discordants_in_bam_zero_in_vcf

The event is shown in the middle of the image. The top panel shows the alignments of all reads (test.bam) and the bottom panel shows the alignments of all discordantly aligned read pairs (test.discordants.bam).

raul-w avatar Dec 03 '18 08:12 raul-w