Sniffles icon indicating copy to clipboard operation
Sniffles copied to clipboard

coverage inconsistency in sniffles v2

Open vmukhina opened this issue 2 years ago • 5 comments

Hi, I performed kind of robustness test on my virus + human data by randomly subsampling reads and calling SVs with sniffles v2. Surprisingly, SV call for all reads miss some SVs identified in low-coverage replicates and does not correlate well with changes in copy number. I wonder what can evoke such behavior and how to avoid it?

I use filters --non-germline --minsupport 5 and I believe higher coverage must result in more SV without any loss but it's wrong. Below is a bedpe representation for SVs on viral part of the reference for all reads and few 20% read replicates.

I also tested sniffles v1 on the same data and its output is more robust and has more SVs. image

vmukhina avatar Feb 16 '23 18:02 vmukhina

Thanks for that and sorry for the late reply.

Yes there is much more filtering implemented in sniffles 2 right now. You could disable this if you want with --no-qc parameter.

So lets see if I read this correctly. What you are showing here are the different subsampling parts and that Sniffles fails to identify sometimes the left and some of the other variants. Could you tell me what are the coverages on the different tiers?

Also if this is all wiht default parameters please.

Thank you Fritz

fritzsedlazeck avatar Jun 15 '23 16:06 fritzsedlazeck

Hi! The overall "complete" track coverage is about 45x but the virus chr has many copies so it goes up to 5489x (shown in the coverage track). Iter 1 to 4 are 4 random 20% subsampling replicates. I believe I used something like sniffles -i file.bam --vcf 02.iter2.vcf --tandem-repeats repeats.trf.bed --non-germline --reference ref.fa --threads 15 --minsupport 5 for snuffles2 runs. Thank you Vera

vmukhina avatar Jun 16 '23 17:06 vmukhina

Interesting. Yeah the non-germline mode is tricky and we are working on improving this. The default (without this tag) would maybe be more consistent. Ideally we will also work on the germline mode to be more consistent too. Thanks Fritz

fritzsedlazeck avatar Jun 16 '23 18:06 fritzsedlazeck

Will Sniffles catch heterogeneous ( present in a small fraction of reads covering the region of interest) SVs without this tag?

vmukhina avatar Jun 19 '23 21:06 vmukhina

So this all comes down to the number or ratio of supportive reads. If its heterozygous (e..g ~0.7-0.3 of reads supporting) then the default mode is the way to go. If you are looking for variants that we refere to mosaic (0.2 -0.05 ratio or reads supporting) then you will need the mosaic (non-germline) approach. I know this is a bit confusing but there are very different thresholds and algorithm at place to enable this.

We are working on a combined approach but facing some difficulties with that (this gets very complicated very fast). Nevertheless, this will come in 2 releases from now.

I hope that helps Fritz

fritzsedlazeck avatar Jun 19 '23 21:06 fritzsedlazeck