graphtyper
graphtyper copied to clipboard
No OLD_VARIANT_ID?
Hello, thank you for making this awesome tool!
I have a question about the "OLD_VARIANT_ID" field. I regenotyped a VCF of SV sites called with Manta. In the Graphtyper output, I am finding some cases where there is no OLD_VARIANT_ID. In the cases that I have looked into further, there seems to be an insertion or deletion in my original SV list starting one BP prior to the variant listed by Graphtyper. Graphtyper does not find support for the original variant listed in these cases.
Examples:
Original SV:
chr2 68441473 MantaINS:264388:0:0:0:0:0 TTAATTCAGAG TATTTATTTATTTATTTATTTATTTATTTATTTATTTATTTATTTATTTTC
. PASS CIGAR=1M50I10D;END=68441483;SVLEN=50;SVTYPE=INS GT:PR:SR
Graphtyper output with no OLD_VARIANT_ID:
chr2 68441474 chr2:68441474:XG ABHet=0.4;ABHetMulti=0.4,0.6;ABHom=0.9779;ABHomMulti=0.9779,-1;AC=2;AF=0.04348;AN=46;CR=0;CRal=0,0;CRalt=0;MMal=3477,294;MMalt=1.27826;MQ=60;MQSal=2736000,82800;MQalt=60;MQsquared=2887200;MaxAAS=3;MaxAASR=0.75;NHet=2;NHomAlt=0;NHomRef=21;PASS_AC=0;PASS_AN=36;PASS_ratio=0.7826;QD=8.571;QDalt=8.571;RefLen=10;SB=0.5313;SBAlt=0.3478;SBF=408,8;SBF1=249,2;SBF2=159,6;SBR=352,15;SBR1=197,13;SBR2=155,2;SDal=112478,3251;SDalt=141.348;SeqDepth=802;VarType=XG
Original SV:
chr3 8361978 MantaDEL:6585:16:16:0:0:0 TAGACAGATCTTTTATCTAAGTAGTCAAGAGCGTCATACACAGAAAGAGAGAATGAGGCAGAGACACAAGCAAATGGAGAAG TCGGCAGCCCAGGTGGCTCAGTGGTTTAGCGCCTCCTTCAGCCCAGGGTGTGATCCTCGGGTCCTGGGATCGAGTCCCACAT . PASS CIGAR=1M81I81D;END=8362059;SVLEN=-81;SVTYPE=DEL GT:PR:SR
Graphtyper output with no OLD_VARIANT_ID:
chr3 8361979 chr3:8361979:XG ABHet=0.6591;ABHetMulti=0.6591,0.3409;ABHom=0.9;ABHomMulti=-1,0.9;AC=44;AF=0.9565;AN=46;CR=0;CRal=0,0;CRalt=0;MMal=1337,1328;MMalt=0.257864;MQ=58;MQSal=180252,1781702;MQalt=59;MQsquared=1972754;MaxAAS=32;MaxAASR=1;NHet=2;NHomAlt=21;NHomRef=0;PASS_AC=25;PASS_AN=26;PASS_ratio=0.5652;QD=24.97;QDalt=24.97;RefLen=81;SB=0.2842;SBAlt=0.2194;SBF=53,113;SBF1=33,91;SBF2=20,22;SBR=16,402;SBR1=15,243;SBR2=1,159;SDal=9560,55628;SDalt=108.016;SeqDepth=587;VarType=XG
There seem to be 92 instances of this in the output VCF, out of 12,885 total output variants.
Thank you for your help!
Best, Kate
Hello Kate,
the problem is that OLD_VARIANT_ID is only stored for SVs and graphtyper consider these examples as small variants. For insertions and deletions I calculate SVLEN=length(ALT)-length(REF) and I only consider it to be an SV if SVLEN>=50 or SVLEN<=-50. For events with -50<SVLEN<50 it usually better to use the graphtyper small variant genotyping model.
For example for MantaINS:264388:0:0:0:0:0 then SVLEN=length(TATTTATTTATTTATTTATTTATTTATTTATTTATTTATTTATTTATTTTC)-length(TTAATTCAGAG)=51-11=40.
The alleles of MantaDEL:6585:16:16:0:0:0 are actually the same size:
TAGACAGATCTTTTATCTAAGTAGTCAAGAGCGTCATACACAGAAAGAGAGAATGAGGCAGAGACACAAGCAAATGGAGAAG
TCGGCAGCCCAGGTGGCTCAGTGGTTTAGCGCCTCCTTCAGCCCAGGGTGTGATCCTCGGGTCCTGGGATCGAGTCCCACAT
so SVLEN=0.
Best, Hannes