graphtyper
graphtyper copied to clipboard
Not all Svimmer INS/DEL/DUP are genotyped by grapthyper as expected
@hannespetur
A number of svimmer SVs do not get genotyped. It is primarily an issue with INS and DUP calls but some DEL calls also are skipped.
INSERTIONS
zgrep -v ^# svimmer/n48299/chr22_svimmer.vcf.gz|grep SVTYPE=INS|wc -l
3800
zgrep -v ^# graphtyper/test/pVCF/graphtyper.autosome.raw.vcf.gz|grep SVTYPE=INS|grep AGGREG|wc -l
3208
DELETIONS
zgrep -v ^# graphtyper/test/pVCF/graphtyper.autosome.raw.vcf.gz|grep SVTYPE=DEL|grep AGGREG|wc -l
12780
zgrep -v ^# svimmer/n48299/chr22_svimmer.vcf.gz|grep SVTYPE=DEL|wc -l
12838
DUPLICATIONS
zgrep -v ^# graphtyper/test/pVCF/graphtyper.autosome.raw.vcf.gz|grep SVTYPE=DUP|grep AGGREG|wc -l
4374
zgrep -v ^# svimmer/n48299/chr22_svimmer.vcf.gz|grep SVTYPE=DUP|wc -l
3885
In #116, I listed some svimmer examples for deletions.
When I looked at one of them closer, there was a DEL variant that was shifted one BP to the right which did not have any SVTYPE annotation.
zgrep 10950658 svimmer/n48299/chr22_svimmer.vcf.gz
chr22 10950658 . AGACCAAAACAAAACAAAAGGCAACATGTGAAGGTACAAAGTGATATATGGAG AAGACCA 0 . END=10950710;SVTYPE=DEL;SVLEN=-52;CIGAR=1M6I52D;NUM_MERGED_SVS=1;STDDEV_POS=0.00,0.00
zgrep 10950659 graphtyper/test/pVCF/graphtyper.autosome.raw.vcf.gz
chr22 10950659 chr22:10950659:XG GACCAAAACAAAACAAAAGGCAACATGTGAAGGTACAAAGTGATATATGGAG AGACCA 0 LowQD;LowQUAL ABHet=-1;ABHom=0.9976;AC=0;AF=0;AN=20;CRal=0,0;MMal=0,0;MQSal=0,0;MaxAAS=1;MaxAASR=0.02041;NHet=0;NHomAlt=0;NHomRef=10;PASS_AC=0;PASS_AN=20;PASS_ratio=1;QD=0;RefLen=52;SDal=0,0;SeqDepth=842;VarType=XG GT:AD:MD:DP:GQ:PL 0/0:80,0:0:80:99:0,255,255 0/0:73,1:1:75:99:0,200,255 0/0:63,0:0:63:99:0,200,255 0/0:98,0:2:100:99:0,255,255 0/0:48,1:3:52:99:0,125,255 0/0:69,0:1:70:99:0,200,2550/0:65,0:0:65:99:0,200,255 0/0:106,0:0:106:99:0,255,255 0/0:122,0:1:123:99:0,255,255 0/0:108,0:0:108:99:0,255,255
These DELs all had a cram cigar with an insertion(eg: CIGAR=1M6I52D). So it looks like the CIGAR having both an INS and DEL may be related to the issue.
Here are a list of similar PASS calls that are missing an SVTYPE and MODEL annotation.
zgrep -v ^# graphtyper/test/pVCF/graphtyper.autosome.raw.vcf.gz|grep -v SVTYPE|grep -v LowQD|cut -f1-8
chr22 11027397 chr22:11027397:XG GAA AAGAAAGAAAGAGAGAGAGAAAGAAAGAAAGAAAGATAGAGAGAGAGAAAG 402 PASS ABHet=0.3162;ABHom=0.8517;AC=4;AF=0.2;AN=20;CRal=0,0;MMal=0,0;MQSal=0,0;MaxAAS=27;MaxAASR=0.4314;NHet=4;NHomAlt=0;NHomRef=6;PASS_AC=3;PASS_AN=14;PASS_ratio=0.7;QD=10.05;RefLen=3;SDal=0,0;SeqDepth=714;VarType=XG
chr22 11455435 chr22:11455435:XG ATGAGGGACAAACATTCAGACCACGGGAGCAGTGTTCTGGAATCCTACGT GA 211 PASS ABHet=0.45;ABHom=0.9167;AC=9;AF=0.45;AN=20;CRal=0,0;MMal=0,0;MQSal=0,0;MaxAAS=7;MaxAASR=1;NHet=3;NHomAlt=3;NHomRef=4;PASS_AC=0;PASS_AN=4;PASS_ratio=0.2;QD=7.536;RefLen=50;SDal=0,0;SeqDepth=96;VarType=XG
chr22 16306115 chr22:16306115:IG GATTCCATTTGATGATGATTCTATTTGAGTCCATTCGATGATTCCATTTG T 135 PASS ABHet=0.2899;ABHom=1;AC=3;AF=0.15;AN=20;CRal=0,0;MMal=0,0;MQSal=0,0;MaxAAS=8;MaxAASR=0.4;NHet=3;NHomAlt=0;NHomRef=7;PASS_AC=2;PASS_AN=18;PASS_ratio=0.9;QD=6.429;RefLen=50;SDal=0,0;SeqDepth=412;VarType=IG
chr22 17260105 chr22:17260105:IG G ACTTTAGCCTCCTGAGTCTATAGGTGCACACCACCACACCTATCCTCCCA 665 PASS ABHet=0.4676;ABHom=-1;AC=10;AF=0.5;AN=20;CRal=0,0;MMal=0,0;MQSal=0,0;MaxAAS=9;MaxAASR=0.5556;NHet=10;NHomAlt=0;NHomRef=0;PASS_AC=10;PASS_AN=20;PASS_ratio=1;QD=9.779;RefLen=1;SDal=0,0;SeqDepth=142;VarType=IG
chr22 17756435 chr22:17756435:XG CTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTTCTTTCT TC 325 PASS ABHet=0.2692;ABHom=0.8913;AC=10;AF=0.5;AN=20;CRal=0,0;MMal=0,0;MQSal=0,0;MaxAAS=7;MaxAASR=1;NHet=6;NHomAlt=2;NHomRef=2;PASS_AC=1;PASS_AN=2;PASS_ratio=0.1;QD=10.48;RefLen=51;SDal=0,0;SeqDepth=124;VarType=XG
chr22 20916632 chr22:20916632:XG AAGA GAAAAGAAAAGAAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAAAGAAAG 383 PASS ABHet=0.4595;ABHom=0.9714;AC=10;AF=0.5;AN=20;CRal=0,0;MMal=0,0;MQSal=0,0;MaxAAS=6;MaxAASR=1;NHet=6;NHomAlt=2;NHomRef=2;PASS_AC=1;PASS_AN=4;PASS_ratio=0.2;QD=12.77;RefLen=4;SDal=0,0;SeqDepth=73;VarType=XG
chr22 23969007 chr22:23969007:XG AAAACTGTTACTCTAACAACAAGTGTTATACACTTACCATGTGCTAGGTCCTCTACAGGTACTTTACACTCATGATCCCATTTGATCCTTACAATCCCTATC CTTACTGAATGTCTAAAAAAACAAGTTTAAACTGTTTGTTACCCAAAGTTTGGTG 634 PASS ABHet=0.488;ABHom=0.9363;AC=4;AF=0.2;AN=20;CRal=0,0;MMal=0,0;MQSal=0,0;MaxAAS=25;MaxAASR=0.8095;NHet=4;NHomAlt=0;NHomRef=6;PASS_AC=2;PASS_AN=12;PASS_ratio=0.6;QD=16.86;RefLen=102;SDal=0,0;SeqDepth=377;VarType=XG
chr22 24725832 chr22:24725832:XG TGGTTCCT ATAGGCGAAACTGCAGAGGGAATGCAATAAAAGGAAATCCCTGTGCTCCCCCTGAGG 674 PASS ABHet=0.3246;ABHom=0.9744;AC=8;AF=0.4;AN=20;CRal=0,0;MMal=0,0;MQSal=0,0;MaxAAS=12;MaxAASR=1;NHet=4;NHomAlt=2;NHomRef=4;PASS_AC=2;PASS_AN=10;PASS_ratio=0.5;QD=12.48;RefLen=8;SDal=0,0;SeqDepth=280;VarType=XG
chr22 25781483 chr22:25781483:IG G ACCTGTGGTCCCAGCTACTCGGGAGGCTGAGGCAGAAGAATAGGTGGGCA 145 PASS ABHet=0.2472;ABHom=0.93;AC=3;AF=0.15;AN=20;CRal=0,0;MMal=0,0;MQSal=0,0;MaxAAS=8;MaxAASR=0.2917;NHet=3;NHomAlt=0;NHomRef=7;PASS_AC=2;PASS_AN=14;PASS_ratio=0.7;QD=6.304;RefLen=1;SDal=0,0;SeqDepth=350;VarType=IG
chr22 26691836 chr22:26691836:IG TTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTT C 20 PASS ABHet=-1;ABHom=1;AC=2;AF=0.1;AN=20;CRal=0,0;MMal=0,0;MQSal=0,0;MaxAAS=1;MaxAASR=1;NHet=0;NHomAlt=1;NHomRef=9;PASS_AC=0;PASS_AN=8;PASS_ratio=0.4;QD=20;RefLen=50;SDal=0,0;SeqDepth=117;VarType=IG
chr22 32988115 chr22:32988115:IG T CGGCCAACATGGATGGGCGGTTCACGAGGTCAAGAGATCAAGACCATCCC 174 PASS ABHet=0.2703;ABHom=0.984;AC=3;AF=0.15;AN=20;CRal=0,0;MMal=0,0;MQSal=0,0;MaxAAS=9;MaxAASR=0.5;NHet=3;NHomAlt=0;NHomRef=7;PASS_AC=2;PASS_AN=16;PASS_ratio=0.8;QD=7.25;RefLen=1;SDal=0,0;SeqDepth=266;VarType=IG
chr22 34278980 chr22:34278980:IG AACATATATATATAATATATATAATATATAATATATATAAAATATATATA T 687 PASS ABHet=0.4324;ABHom=0.98;AC=12;AF=0.6;AN=20;CRal=0,0;MMal=0,0;MQSal=0,0;MaxAAS=10;MaxAASR=1;NHet=4;NHomAlt=4;NHomRef=2;PASS_AC=4;PASS_AN=8;PASS_ratio=0.4;QD=16.36;RefLen=50;SDal=0,0;SeqDepth=90;VarType=IG
chr22 36751600 chr22:36751600:XG CATATATGTCATATATATCATATATATCATATATATATCATATATATCAT ATC 0 LowQUAL ABHet=-1;ABHom=1;AC=0;AF=0;AN=4;CRal=0,0;MMal=0,0;MQSal=0,0;MaxAAS=0;MaxAASR=0;NHet=0;NHomAlt=0;NHomRef=10;PASS_AC=0;PASS_AN=4;PASS_ratio=1;QD=0;RefLen=50;SDal=0,0;SeqDepth=41;VarType=XG
chr22 38083743 chr22:38083743:XG GGAGGGTGTACTCAGAGACAGGTGCACCAGGAGCCGGGGGCTGGGGATAG CGGCGCTCCTGC 510 PASS ABHet=0.4667;ABHom=1;AC=3;AF=0.15;AN=20;CRal=0,0;MMal=0,0;MQSal=0,0;MaxAAS=49;MaxAASR=1;NHet=1;NHomAlt=1;NHomRef=8;PASS_AC=3;PASS_AN=20;PASS_ratio=1;QD=25;RefLen=50;SDal=0,0;SeqDepth=561;VarType=XG
chr22 39653084 chr22:39653084:XG GC TTCCCCCACACAGTGGCTAAGAGGGCTGACTGCATTGTGGGTGCACGGATT 765 PASS ABHet=0.5493;ABHom=1;AC=4;AF=0.2;AN=20;CRal=0,0;MMal=0,0;MQSal=0,0;MaxAAS=46;MaxAASR=1;NHet=2;NHomAlt=1;NHomRef=7;PASS_AC=4;PASS_AN=20;PASS_ratio=1;QD=25;RefLen=2;SDal=0,0;SeqDepth=350;VarType=XG
chr22 41552567 chr22:41552567:XG GTAGTATTGA TTTTGTTTGAGATCACAGCTCACTGCAGCCTCTACCTCCTAGGCTCAAGT 1608 PASS ABHet=0.7458;ABHom=0.9492;AC=15;AF=0.75;AN=20;CRal=0,0;MMal=0,0;MQSal=0,0;MaxAAS=16;MaxAASR=1;NHet=5;NHomAlt=5;NHomRef=0;PASS_AC=8;PASS_AN=10;PASS_ratio=0.5;QD=18.37;RefLen=10;SDal=0,0;SeqDepth=118;VarType=XG
chr22 48027078 chr22:48027078:XG TTCC CAGACCAGGCCAGACCGTGGTCTCGAGACCAGACCGTGGTCTAGAGACCAT 2190 PASS ABHet=0.5487;ABHom=1;AC=15;AF=0.75;AN=20;CRal=0,0;MMal=0,0;MQSal=0,0;MaxAAS=46;MaxAASR=1;NHet=3;NHomAlt=6;NHomRef=1;PASS_AC=15;PASS_AN=20;PASS_ratio=1;QD=23.89;RefLen=4;SDal=0,0;SeqDepth=360;VarType=XG
chr22 49299624 chr22:49299624:XG CT TCAGCACAGCACAGCCATCAACTCCAGATCCTGGCCTGGGGCACTCCCTC 50 PASS ABHet=0.2941;ABHom=1;AC=1;AF=0.05;AN=20;CRal=0,0;MMal=0,0;MQSal=0,0;MaxAAS=5;MaxAASR=0.2941;NHet=1;NHomAlt=0;NHomRef=9;PASS_AC=1;PASS_AN=20;PASS_ratio=1;QD=10;RefLen=2;SDal=0,0;SeqDepth=273;VarType=XG
chr22 50073048 chr22:50073048:IG TTCCCGGGCAGGCGTGGGCCCCTTCTCCGCAGTCCACCCGGCCATACCAT C,CTCCCGGGCAGGCGTGGGCCCCTTCTCGGCAGTCCACCCGGCCACACTGGTCCCGGGCAGGCGTGGGCCCCTTCTCCGCAGTCCACCCGGCCATACCAT 455 PASS ABHet=0.433;ABHom=0.9971;AC=0,2;AF=0,0.1;AN=20;CRal=0,0,0;MMal=0,0,0;MQSal=0,0,0;MaxAAS=1,25;MaxAASR=0.02703,0.5556;NHet=2;NHomAlt=0;NHomRef=8;PASS_AC=0,2;PASS_AN=20;PASS_ratio=1;QD=22.5;RefLen=50;SDal=0,0,0;SeqDepth=621;VarType=IG
So it looks like there may be one variant called by graphtyper but with a missing SVTYPE and Model annotation. It looks like graphtyper thinks the variant is less than 50 bp since the ref and ALT are printed out. Svimmer has these with an SVLEN >= 50bp. It looks like graphtyper is dropping the first base of the reference relative to the original call from svimmer resulting in a SVLEN <50bp and not printing out the SV models.
chr22 11027396 . AGAA AAAGAAAGAAAGAGAGAGAGAAAGAAAGAAAGAAAGATAGAGAGAGAGAAAG 0 . END=11027399;SVTYPE=INS;SVLEN=51;CIGAR=1M51I3D;NUM_MERGED_SVS=322;STDDEV_POS=65.78,65.78
@jjfarrell I also have a similar problem with a deletion. Did you find a solution?
@ValentinaPeona Not Yet. Did your deletion also have a nearby INS based in the CRAM cigar in this region?