graphtyper icon indicating copy to clipboard operation
graphtyper copied to clipboard

No OLD_VARIANT_ID?

Open kmegq opened this issue 3 years ago • 1 comments

Hello, thank you for making this awesome tool!

I have a question about the "OLD_VARIANT_ID" field. I regenotyped a VCF of SV sites called with Manta. In the Graphtyper output, I am finding some cases where there is no OLD_VARIANT_ID. In the cases that I have looked into further, there seems to be an insertion or deletion in my original SV list starting one BP prior to the variant listed by Graphtyper. Graphtyper does not find support for the original variant listed in these cases.

Examples:

Original SV:
chr2    68441473        MantaINS:264388:0:0:0:0:0       TTAATTCAGAG     TATTTATTTATTTATTTATTTATTTATTTATTTATTTATTTATTTATTTTC
     .       PASS    CIGAR=1M50I10D;END=68441483;SVLEN=50;SVTYPE=INS GT:PR:SR

Graphtyper output with no OLD_VARIANT_ID:
chr2    68441474        chr2:68441474:XG        ABHet=0.4;ABHetMulti=0.4,0.6;ABHom=0.9779;ABHomMulti=0.9779,-1;AC=2;AF=0.04348;AN=46;CR=0;CRal=0,0;CRalt=0;MMal=3477,294;MMalt=1.27826;MQ=60;MQSal=2736000,82800;MQalt=60;MQsquared=2887200;MaxAAS=3;MaxAASR=0.75;NHet=2;NHomAlt=0;NHomRef=21;PASS_AC=0;PASS_AN=36;PASS_ratio=0.7826;QD=8.571;QDalt=8.571;RefLen=10;SB=0.5313;SBAlt=0.3478;SBF=408,8;SBF1=249,2;SBF2=159,6;SBR=352,15;SBR1=197,13;SBR2=155,2;SDal=112478,3251;SDalt=141.348;SeqDepth=802;VarType=XG

Original SV:
chr3    8361978 MantaDEL:6585:16:16:0:0:0       TAGACAGATCTTTTATCTAAGTAGTCAAGAGCGTCATACACAGAAAGAGAGAATGAGGCAGAGACACAAGCAAATGGAGAAG      TCGGCAGCCCAGGTGGCTCAGTGGTTTAGCGCCTCCTTCAGCCCAGGGTGTGATCCTCGGGTCCTGGGATCGAGTCCCACAT      .       PASS    CIGAR=1M81I81D;END=8362059;SVLEN=-81;SVTYPE=DEL GT:PR:SR

Graphtyper output with no OLD_VARIANT_ID:
chr3    8361979 chr3:8361979:XG ABHet=0.6591;ABHetMulti=0.6591,0.3409;ABHom=0.9;ABHomMulti=-1,0.9;AC=44;AF=0.9565;AN=46;CR=0;CRal=0,0;CRalt=0;MMal=1337,1328;MMalt=0.257864;MQ=58;MQSal=180252,1781702;MQalt=59;MQsquared=1972754;MaxAAS=32;MaxAASR=1;NHet=2;NHomAlt=21;NHomRef=0;PASS_AC=25;PASS_AN=26;PASS_ratio=0.5652;QD=24.97;QDalt=24.97;RefLen=81;SB=0.2842;SBAlt=0.2194;SBF=53,113;SBF1=33,91;SBF2=20,22;SBR=16,402;SBR1=15,243;SBR2=1,159;SDal=9560,55628;SDalt=108.016;SeqDepth=587;VarType=XG

There seem to be 92 instances of this in the output VCF, out of 12,885 total output variants.

Thank you for your help!

Best, Kate

kmegq avatar Aug 27 '21 17:08 kmegq

Hello Kate,

the problem is that OLD_VARIANT_ID is only stored for SVs and graphtyper consider these examples as small variants. For insertions and deletions I calculate SVLEN=length(ALT)-length(REF) and I only consider it to be an SV if SVLEN>=50 or SVLEN<=-50. For events with -50<SVLEN<50 it usually better to use the graphtyper small variant genotyping model.

For example for MantaINS:264388:0:0:0:0:0 then SVLEN=length(TATTTATTTATTTATTTATTTATTTATTTATTTATTTATTTATTTATTTTC)-length(TTAATTCAGAG)=51-11=40.

The alleles of MantaDEL:6585:16:16:0:0:0 are actually the same size:

TAGACAGATCTTTTATCTAAGTAGTCAAGAGCGTCATACACAGAAAGAGAGAATGAGGCAGAGACACAAGCAAATGGAGAAG
TCGGCAGCCCAGGTGGCTCAGTGGTTTAGCGCCTCCTTCAGCCCAGGGTGTGATCCTCGGGTCCTGGGATCGAGTCCCACAT

so SVLEN=0.

Best, Hannes

hannespetur avatar Aug 31 '21 08:08 hannespetur