strelka icon indicating copy to clipboard operation
strelka copied to clipboard

Strelka variant found where there is no coverage

Open osowiecki opened this issue 5 years ago • 2 comments

I've just checked and I can see a variant :

1 211980 . C CAG 53 PASS CIGAR=1M2I;RU=AG;REFREP=0;IDREP=1;MQ=60 GT:GQ:GQX:DPI:AD:ADF:ADR:FT:PL 1/1:12:10:23:0,5:0,3:0,2:PASS:94,15,0

That has 0 coverage in the bam file. Checked every single position with bedtools genomecov. Can anyone explain to me how Strelka can find variants where there are no aligments?

It looks like Strelka is ignoring positions with depth 0 and calculates depth based on surrounding non 0 positions while still claiming that there is a variant at this position.

samtools depth -r 1:211915-211991 2.bam

1 211915 27 1 211916 27 1 211917 27 1 211918 27 1 211919 26 1 211920 26 1 211921 26 1 211922 26 1 211923 26 1 211985 17 1 211986 17 1 211987 17 1 211988 17 1 211989 17 1 211990 17 1 211991 17

osowiecki avatar Sep 02 '19 11:09 osowiecki

I am not a strelka developer, but that is most likely due to the local realignment happening before the variant calling step "This is followed by segmenting of the genome for parallel processing, where within each segment all input samples are jointly analyzed to identify candidate alleles, realign all input reads, analyze reads to make model specific variant inferences, then compute properties of each variant used to apply filters or empirically recalibrate confidence that each variant represents a germline or somatic variant in the input sample(s)" from: https://github.com/Illumina/strelka/blob/v2.9.x/docs/userGuide/README.md#method-overview

So the bam that you see (and use as input) is not the bam, that the variant calling process sees.

SebastianHollizeck avatar Sep 03 '19 05:09 SebastianHollizeck

That might be the case. Thank you. I've tried to determine if a given variant from particular sample in a multisample vcf is marked as non present because of low coverage (first column) or because there really is no difference there and the genotype is 0/0 at that position in this sample .

40 1 211862 . A G 438.72 PASS AC=50;AN=50;MQ=60;SF=0,1,2,3;SNVHPOL=3 GT:FT:GQX:DP:ADR:DPF:SB:PL:ADF:GQ:AD 1/1:PASS:30:38:0,20:0:-60.2:370,114,0:0,18:111:0,38 41 1 211863 . G A 442.84 PASS AC=50;AN=50;MQ=60;SF=0,1,2,3;SNVHPOL=3 GT:SB:DP:GQX:FT:DPF:ADR:AD:PL:GQ:ADF 1/1:-57.4:37:30:PASS:1:0,20:0,37:370,111,0:108:0,17 45 1 211864 . G A 420.28 PASS AC=50;AN=50;MQ=60;SF=0,1,2,3;SNVHPOL=2 GT:ADR:DPF:FT:DP:GQX:SB:ADF:GQ:PL:AD 1/1:0,21:0:PASS:39:30:-57.4:0,18:114:370,117,0:0,39 43 1 211884 . T A 451.84 PASS AC=50;AN=50;MQ=60;SF=0,1,2,3;SNVHPOL=4 GT:AD:GQ:ADF:PL:SB:DPF:ADR:DP:GQX:FT 1/1:0,37:108:0,19:370,111,0:-61.2:2:0,18:37:30:PASS 0 1 211980 . C CAG 101.84 PASS AC=33;AN=42;CIGAR=1M2I;IDREP=1;MQ=60;REFREP=0;RU=AG;SF=0,1,2,3 GT:PL:PS:GQ:ADF:AD:DPI:GQX:FT:ADR 1/1:94,15,0:.:12:0,3:0,5:23:10:PASS:0,2 6 1 211982 . TGC T 107.76 PASS AC=33;AN=42;CIGAR=1M2D;IDREP=0;MQ=60;REFREP=1;RU=GC;SF=0,1,2,3 GT:GQX:FT:ADR:PS:PL:GQ:ADF:AD:DPI 1/1:10:PASS:0,2:.:94,15,0:12:0,3:0,5:24 33 1 212033 . T C 260.39 PASS AC=14;AN=22;MQ=60;SF=0,2,3;SNVHPOL=2 GT:SB:DPF:ADR:DP:GQX:FT:AD:GQ:ADF:PL 1/1:-47.5:0:0,13:29:30:PASS:0,29:84:0,16:370,87,0 40 1 212362 . G C 298.08 PASS AC=33;AN=46;MQ=60;SF=0,1,2,3;SNVHPOL=4 GT:GQX:DP:FT:DPF:ADR:SB:PL:GQ:ADF:AD 1/1:22:34:PASS:1:0,19:-51:370,102,0:99:0,15:0,34 35 1 212464 . G GA 216.92 PASS AC=17;AN=28;CIGAR=1M1I;IDREP=9;MQ=60;REFREP=8;RU=A;SF=0,2,3 GT:GQX:FT:ADR:AD:DPI:PL:GQ:ADF 1/1:27:PASS:0,13:0,28:29:541,82,0:79:0,15 45 1 212740 rs851821369 A G 352.33 PASS AC=36;AN=46;MQ=60;SF=0,1,2,3;SNVHPOL=2 GT:SB:DP:GQX:FT:DPF:ADR:AD:PL:GQ:ADF 1/1:-55.1:39:30:PASS:0:0,16:0,39:370,117,0:114:0,23 47 1 217595 . C A 324.26 PASS AC=14;AN=22;MQ=60;SF=0,2,3;SNVHPOL=2 GT:FT:GQX:DP:ADR:DPF:SB:PL:PS:ADF:GQ:AD 1/1:PASS:30:42:0,23:0:-63.5:370,126,0:.:0,19:123:0,42

osowiecki avatar Sep 03 '19 05:09 osowiecki