Sniffles icon indicating copy to clipboard operation
Sniffles copied to clipboard

difference between SUPPORT and DV

Open prasundutta87 opened this issue 1 year ago • 8 comments

Hi,

I was looking at an SV event in IGV. The VCF record from sniffles2 has SUPPORT=1, but the DV was 2. SUPPORT means reads supporting the SVs. DV means variant supporting reads. What is the difference between the two? It was an insertion event and I could see only 1 read with the insertion event. Why was DV 2 in this case? Any explaination would be very helpful.

Regards, Prasun

prasundutta87 avatar Mar 08 '23 16:03 prasundutta87

Can you show the command that you ran? I am trying to understand if this is the non-germline or default mode. Thx Fritz

fritzsedlazeck avatar Mar 08 '23 16:03 fritzsedlazeck

Hi,

This is the command and it was run in default mode:

sniffles --input snf_files_list.tsv --tandem-repeats $TANDEM_REPEAT_PATH/human_GRCh38_no_alt_analysis_set.trf.bed --minsvlen 50 --reference $REFERENCE --threads 16 --vcf output_multisample.vcf.gz

Regards, Prasun

prasundutta87 avatar Mar 08 '23 16:03 prasundutta87

Any update on this issue? I have the same question since we are using the DV field for variant filtering but doesn't seem to match the SUPPORT field. I ran it on the default mode.

Akazhiel avatar Mar 20 '23 10:03 Akazhiel

Same here!

christopher-schroeder avatar Apr 08 '24 09:04 christopher-schroeder

sorry all for the slow reply. DV refers to the number of reads; Support should be the number of samples in a multi sample VCF from sniffles. Retrospectively I see how that can be confusing. Sorry Fritz

fritzsedlazeck avatar Apr 08 '24 13:04 fritzsedlazeck

Don't worry, we all know what it is like to be busy. Thanks for answering!

It seems that I completely misunderstand something fundamentally. In my opinion DV = |RNAMES|. I don't see this in any of my SVs.

One example:

15	44633572	Sniffles2.INS.22CS6	N	AAAAAGA...AAAAAAAAAAAAAAAA	60	PASS	PRECISE;SVTYPE=INS;SVLEN=4256;END=44633572;SUPPORT=6;RNAMES=8818c927-5059-46ee-a14f-93f60b12f1e4,06ae71db-bd97-4956-a277-60aad89e8695,2a67c270-01c5-4020-8461-d6694d7c0a27,67b0b91c-e6d9-4c33-9fb6-1ab8d5c381e2,79e3f01b-4bf9-4af5-9e45-4bb0d87e64c0,93a83dd7-dc2a-4fa5-b9cc-4f0cd14f041e;COVERAGE=12,12,14,14,15;STRAND=+-;AF=0.929;PHASE=1,44606444,6,6,PASS,PASS;STDEV_LEN=14.660;STDEV_POS=0.000;SUPPORT_LONG=0	GT:GQ:DR:DV	1/1:26:1:13

(I shortened the sequence in the ALT field for readability)

We deal with only 1 sample, so no mutli-sample calling. There are 6 RNAMES, so 6 reads supporting my SV. I can visually confirm these via IGV. SUPPORT=6, so this likely refers to these 6 reads mentioned in RNAMES. I would (visually) agree with 6 out of 14, with is about 50%.

But then

DV=13 DR=1 AF=0.929 GT=1/1

The value of DV doesn't reflect the number of RNAMES at all and apparently my assumption is wrong. Would you help me? Maybe there are internal realignments in sniffles and there are more reads supporting and counted in DV?

In case it helps I also attach a screenshot of the region.

igv_snapshot

THANK YOU!

christopher-schroeder avatar Apr 08 '24 13:04 christopher-schroeder

I've looked through your code and I've found the reason. You are rescaling the number of supports for long insertions https://github.com/fritzsedlazeck/Sniffles/blob/fd31ee3962d7c588752e1926176c79912691d54e/src/sniffles/postprocessing.py#L142-L148

in order to increase the sensitivity to detect them

https://github.com/fritzsedlazeck/Sniffles/blob/fd31ee3962d7c588752e1926176c79912691d54e/src/sniffles/postprocessing.py#L170

That seems to be valid. But you also rescale the support when genotyping the call

https://github.com/fritzsedlazeck/Sniffles/blob/fd31ee3962d7c588752e1926176c79912691d54e/src/sniffles/postprocessing.py#L347

and write this rescaled support count in its genotype at a position, that is later used as DV

https://github.com/fritzsedlazeck/Sniffles/blob/fd31ee3962d7c588752e1926176c79912691d54e/src/sniffles/postprocessing.py#L414

But this new rescaled support does not represent the number of supporting reads anymore and also the derived allele-frequency is wrong. This is a bug in my opinion. If you agree, I would write a pull request to change this behavior.

christopher-schroeder avatar Apr 09 '24 08:04 christopher-schroeder

Thanks for digging in. There are two modi. One is default/gremline and the other is mosaic Sv calling. I think the scaling is on the mosaic. @hermannromanek and I can check this out to make sure. Thanks Fritz

fritzsedlazeck avatar Apr 09 '24 10:04 fritzsedlazeck