Sniffles
Sniffles copied to clipboard
Overlapping deletions/insertions
Hi there,
I have noticed several times that there will be a cluster of small-ish deletions or insertions that appear to frequently overlap one another. What I can say with some certainty, based on assemblies, is that within these region there is generally just a single event covering the entire region labelled by all these small-ish calls.
For example:
chr1 | 160176 | Sniffles2.DEL.9AS0 | N | <DEL> | 60 | GT | PRECISE;SVTYPE=DEL;SVLEN=-174;END=160350;SUPPORT=3;COVERAGE=20,13,13,13,13;STRAND=+-;AF=0.231;STDEV_LEN=6.557;STDEV_POS=6.351 | GT:GQ:DR:DV | 0/1:2:10:3
chr1 | 160246 | Sniffles2.DEL.99S0 | N | <DEL> | 60 | PASS | IMPRECISE;SVTYPE=DEL;SVLEN=-105;END=160351;SUPPORT=5;COVERAGE=20,13,13,13,13;STRAND=+-;AF=0.385;STDEV_LEN=8.173;STDEV_POS=89.196 | GT:GQ:DR:DV | 0/1:27:8:5
chr1 | 160589 | Sniffles2.DEL.9CS0 | N | <DEL> | 60 | PASS | IMPRECISE;SVTYPE=DEL;SVLEN=-572;END=161161;SUPPORT=7;COVERAGE=13,13,13,13,13;STRAND=+-;AF=0.538;STDEV_LEN=31.525;STDEV_POS=29.533 | GT:GQ:DR:DV | 0/1:40:6:7
chr1 | 161141 | Sniffles2.DEL.A6S0 | N | <DEL> | 60 | GT | IMPRECISE;SVTYPE=DEL;SVLEN=-2810;END=163951;SUPPORT=3;COVERAGE=13,13,13,13,13;STRAND=-;AF=0.231;STDEV_LEN=286.728;STDEV_POS=133.910 | GT:GQ:DR:DV | 0/1:2:10:3
chr1 | 161242 | Sniffles2.DEL.ACS0 | N | <DEL> | 60 | PASS | PRECISE;SVTYPE=DEL;SVLEN=-4628;END=165870;SUPPORT=4;COVERAGE=13,13,13,13,13;STRAND=+;AF=0.308;STDEV_LEN=0.000;STDEV_POS=0.000 | GT:GQ:DR:DV | 0/1:14:9:4
chr1 | 163495 | Sniffles2.DEL.ABS0 | N | <DEL> | 60 | GT | IMPRECISE;SVTYPE=DEL;SVLEN=-2419;END=165914;SUPPORT=3;COVERAGE=13,13,13,13,18;STRAND=+-;AF=0.231;STDEV_LEN=369.793;STDEV_POS=99.594 | GT:GQ:DR:DV | 0/1:2:10:3
chr1 | 165058 | Sniffles2.DEL.AAS0 | N | <DEL> | 60 | PASS | IMPRECISE;SVTYPE=DEL;SVLEN=-869;END=165927;SUPPORT=4;COVERAGE=13,13,13,13,18;STRAND=+-;AF=0.308;STDEV_LEN=111.723;STDEV_POS=9.192 | GT:GQ:DR:DV | 0/1:14:9:4
chr1 | 166033 | Sniffles2.DEL.ADS0 | N | <DEL> | 60 | PASS | PRECISE;SVTYPE=DEL;SVLEN=-115;END=166148;SUPPORT=9;COVERAGE=13,13,13,13,18;STRAND=+-;AF=0.692;STDEV_LEN=0.000;STDEV_POS=0.000 | GT:GQ:DR:DV | 0/1:14:4:9
Above there are several deletions covering the regions 160176-165927 on chromosome 1, which is more simply just a single deletion of that entire region.
Are all these small calls and their overlaps informative or an issue?
Thanks a lot
Hello,
this may be caused by deletions being split up within an alignment across highly repetitive regions. We recommend running Sniffles2 with a tandem repeat annotations .bed file (which can be downloaded from the Sniffles GitHub, and supplied to --tandem-repeats), which will allow differentiated clustering across these regions.
Hope this helps! Moritz
Although it is a repetitive region, it doesn't technically contain tandem repeats. The region is a isolated transposon which contains inverted repeats on either end but they are only 300bp and separated by ~5.5kb. Do you think providing a bed file highlighting these regions would still help?
Sniffles uses different clustering parameters in the repeat regions to account for SVs being split up into multiple smaller events by read aligners - so yes, even though you are not technically dealing with tandem repeats, providing the regions as .bed file may help. Whether they will be detected as single SV in the end will however depend on multiple factors, including the standard deviation of the start positions of the individual SV signals. Our benchmarks are of course only for tandem repeats. Just as a tip in case you want to provide your own .bed file: I would recommend a single entry for the full region and to make sure it is coordinate sorted before supplying it to Sniffles.
Best, Moritz
Hi Moritz, Thanks for the help here
So I tried what you suggested, added a bed file (below) to the calling process encompassing the entire region containing the transposon. Some results below if you are interested.
What happened is a bit strange. A large deletion now is called prior to the actual deletion site and only the last deletion call remains within the deleted region from the non-bed file call set.
bed:
chr1 160176 165927
vcf:
chr1 154952 Sniffles2.DEL.78S0 N <DEL> 60 PASS IMPRECISE;SVTYPE=DEL;SVLEN=-5399;END=160351;SUPPORT=12;COVERAGE=19,19,20,13,13;STRAND=+-;AF=0.706;STDEV_LEN=13.248;STDEV_POS=57.913 GT:GQ:DR:DV 0/1:16:5:12
chr1 166033 Sniffles2.DEL.79S0 N <DEL> 60 PASS PRECISE;SVTYPE=DEL;SVLEN=-115;END=166148;SUPPORT=12;COVERAGE=13,13,13,13,18;STRAND=+-;AF=0.923;STDEV_LEN=4.806;STDEV_POS=0.000 GT:GQ:DR:DV 1/1:23:1:12
I then tried adjusting the region within the bed file. Increasing the borders +-2kb didn't change anything However decreasing the size +-2kb did. The deletion prior to the actual deletion is now smaller. The last deletion remains. Plus there are a few overlapping calls as before although fewer.
bed:
chr1 162000 164000
vcf:
chr1 158730 Sniffles2.DEL.7AS0 N <DEL> 60 PASS IMPRECISE;SVTYPE=DEL;SVLEN=-2497;END=161227;SUPPORT=12;COVERAGE=19,19,13,13,13;STRAND=+-;AF=0.800;STDEV_LEN=1102.503;STDEV_POS=203.876 GT:GQ:DR:DV 1/1:3:3:12
chr1 160233 Sniffles2.DEL.78S0 N <DEL> 60 PASS IMPRECISE;SVTYPE=DEL;SVLEN=-118;END=160351;SUPPORT=8;COVERAGE=20,13,13,13,13;STRAND=+-;AF=0.615;STDEV_LEN=14.716;STDEV_POS=63.604 GT:GQ:DR:DV 0/1:27:5:8
chr1 161242 Sniffles2.DEL.80S0 N <DEL> 60 PASS PRECISE;SVTYPE=DEL;SVLEN=-4628;END=165870;SUPPORT=4;COVERAGE=13,13,13,13,13;STRAND=+;AF=0.308;STDEV_LEN=0.000;STDEV_POS=0.000 GT:GQ:DR:DV 0/1:14:9:4
chr1 163495 Sniffles2.DEL.7FS0 N <DEL> 60 GT IMPRECISE;SVTYPE=DEL;SVLEN=-2419;END=165914;SUPPORT=3;COVERAGE=13,13,13,13,18;STRAND=+-;AF=0.231;STDEV_LEN=369.793;STDEV_POS=99.594 GT:GQ:DR:DV 0/1:2:10:3
chr1 165058 Sniffles2.DEL.7ES0 N <DEL> 60 PASS IMPRECISE;SVTYPE=DEL;SVLEN=-869;END=165927;SUPPORT=4;COVERAGE=13,13,13,13,18;STRAND=+-;AF=0.308;STDEV_LEN=111.723;STDEV_POS=9.192 GT:GQ:DR:DV 0/1:14:9:4
chr1 166033 Sniffles2.DEL.81S0 N <DEL> 60 PASS PRECISE;SVTYPE=DEL;SVLEN=-115;END=166148;SUPPORT=9;COVERAGE=13,13,13,13,18;STRAND=+-;AF=0.692;STDEV_LEN=0.000;STDEV_POS=0.000 GT:GQ:DR:DV 0/1:14:4:9
Anything I could adjust? most importantly to get rid of this false positive deletion at the beginning of the call? Alternatively, I think I'll just merge overlapping deletions/insertion call without using a bed file. Anything potentially problematic with this? I am testing this on homozygous samples so there should not be an issue with haplotypes.
Thanks again
Hello again!
it seems from your results that your region may not benefit from this mode, which is designed to handle simple repeats, as you noted. In some cases such as the one you are seeing, the alignments will not allow Sniffles to merge SVs - because there is no (or not enough) signal to distinguish between multiple SVs versus a single large one. This can be either due to the aligner used, or more generally due to the structure of the region.
If you are certain that this is one SV biologically, I would also suggest manually merging it. We are however planning to introduce postprocessing scripts for different biological entities (such as transposons) that may lead to complex SVs in the future.
Thanks and best regards, Moritz
Awesome, I'll keep an eye out for this post-processing script For now, at least in these simple homozygous cases, all the overlapping chunks appear to contain simple, single, large events so it looks safe to merge them.
Definitely could be aligner specific. In my case I have used minimap2...so perhaps it would be worthwhile testing NGMLR (if I had the time), especially considering if my memory serves me correct that it was performing better at split read alignment, perhaps better allowing reads to span large deletions/insertions.
Thanks for the help (i'll leave you to close the comment if you are happy to).
Me again, Just to note, I just tried it with NGMLR, as the overlap aggregation of the minimap2 calls didn't work as well as I hoped, and sadly using NGMLR did not solve the problem. In fact now with NGMLR there are a lot of enormous inversion calls which appear around these same regions now. Thanks again
Sounds like your region also leads to disagreement between mappers in terms of how they split up read alignments - this can happen in regions containing complex / fragmented SVs. If you know the biological ground truth, a specific postprocessing script for these cases is probably the best solution, but also requires understanding what exactly happens with the read alignments in those regions (why they get fragmented) and may not deliver the best results, as you noted. If your goal is to find a way to detect other transposons of this kind, what also could be worth a try is identifying a shared signature that they leave in the read alignments (and successive SV calls).