graphtyper
graphtyper copied to clipboard
INFO/AAScore using graphtyper 2.7.4 is missing
When running Graphtyper 2.7, the info field output included the AAScore field along with the other QC fields. However, when we ran the newer 2.74, the AAScore was not found in the INFO field even though it was described in the VCF Header. Any suggestions with this?
zgrep Vers adsp_13710_pcrfree.graphtyper.chr1.vcf.gz ##graphtyperVersion=2.7.4 ##bcftools_concatVersion=1.10.2+htslib-1.10.2
[farrell@scc-wr3 workarea.ont_manta.adsp_13710_pcrfree]$ zcat adsp_13710_pcrfree.graphtyper.chr1.vcf.gz|grep -v ^#|head|cut -f1-8
chr1 10001 chr1:10001:UG N <DUP:SVSIZE=196:AGGREGATED> 3233292 LowPratio ABHet=0.5848;ABHom=0.9995;AC=27342;AF=0.9972;AN=27420;END=10197;MaxAAS=43;MaxAASR=1;MaxAltPP=0;NHet=32;NHomAlt=13655;NHomRef=23;NUM_MERGED_SVS=3;PASS_AC=0;PASS_AN=0;PASS_ratio=0;QD=23.33;RELATED_SV_ID=1;RefLen=1;SEQ=AACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCCTAACCCTAACCCTAACCCTAACCCTAACCTAACCCTAACCCTAACCC;SVLEN=196;SVMODEL=AGGREGATED;SVSIZE=196;SVTYPE=DUP;SV_ID=0;SeqDepth=316675;VarType=UG
chr1 10001 chr1:10001:UG.0 N <DUP:SVSIZE=196:BREAKPOINT1> 36613 LowPratio ABHet=0.3333;ABHom=0.986;AC=2808;AF=0.7883;AN=3562;END=10197;MaxAAS=4;MaxAASR=1;MaxAltPP=4;NHet=20;NHomAlt=1394;NHomRef=12296;NUM_MERGED_SVS=3;PASS_AC=0;PASS_AN=0;PASS_ratio=0;QD=21.88;RELATED_SV_ID=1;RefLen=1;SEQ=AACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCCTAACCCTAACCCTAACCCTAACCCTAACCTAACCCTAACCCTAACCC;SVLEN=196;SVMODEL=BREAKPOINT1;SVSIZE=196;SVTYPE=DUP;SV_ID=0;SeqDepth=2285;VarType=UG
chr1 10001 chr1:10001:UG.1 N <DUP:SVSIZE=196:BREAKPOINT2> 3106 LowPratio ABHet=0.2033;ABHom=0.9994;AC=231;AF=0.008739;AN=26434;END=10197;MaxAAS=3;MaxAASR=1;MaxAltPP=3;NHet=159;NHomAlt=36;NHomRef=13515;NUM_MERGED_SVS=3;PASS_AC=0;PASS_AN=0;PASS_ratio=0;QD=14.58;RELATED_SV_ID=0;RefLen=1;SEQ=AACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCCTAACCCTAACCCTAACCCTAACCCTAACCTAACCCTAACCCTAACCC;SVLEN=196;SVMODEL=BREAKPOINT2;SVSIZE=196;SVTYPE=DUP;SV_ID=1;SeqDepth=50612;VarType=UG
chr1 10001 chr1:10001:UG.2 N <DUP:SVSIZE=196:COVERAGE> 3235545 PASS ABHet=0.6227;ABHom=0.9994;AC=27373;AF=0.9983;AN=27420;END=10197;MaxAAS=43;MaxAASR=1;MaxAltPP=0;NHet=47;NHomAlt=13663;NHomRef=0;NUM_MERGED_SVS=3;PASS_AC=27127;PASS_AN=27142;PASS_ratio=0.9899;QD=23.31;RELATED_SV_ID=1;RefLen=1;SEQ=AACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCCTAACCCTAACCCTAACCCTAACCCTAACCTAACCCTAACCCTAACCC;SVLEN=196;SVMODEL=COVERAGE;SVSIZE=196;SVTYPE=DUP;SV_ID=0;SeqDepth=316924;VarType=UG
chr1 10111 chr1:10111:DG N <DEL:SVSIZE=66:AGGREGATED> 731 LowPratio ABHet=0.4095;ABHom=0.9999;AC=33;AF=0.001204;AN=27420;END=10177;MaxAAS=9;MaxAASR=1;MaxAltPP=9;NHet=9;NHomAlt=12;NHomRef=13689;NUM_MERGED_SVS=1;PASS_AC=0;PASS_AN=0;PASS_ratio=0;QD=8.807;RefLen=1;SVLEN=66;SVMODEL=AGGREGATED;SVSIZE=66;SVTYPE=DEL;SV_ID=2;SeqDepth=415129;VarType=DG
chr1 10111 chr1:10111:DG.0 N <DEL:SVSIZE=66:BREAKPOINT> 251590 LowPratio ABHet=0.611;ABHom=0.9987;AC=16742;AF=0.9624;AN=17396;END=10177;MaxAAS=9;MaxAASR=1;MaxAltPP=9;NHet=336;NHomAlt=8203;NHomRef=5171;NUM_MERGED_SVS=1;PASS_AC=3;PASS_AN=6;PASS_ratio=0.0003449;QD=13.97;RefLen=1;SVLEN=66;SVMODEL=BREAKPOINT;SVSIZE=66;SVTYPE=DEL;SV_ID=2;SeqDepth=18862;VarType=DG
chr1 10111 chr1:10111:DG.1 N <DEL:SVSIZE=66:COVERAGE> 124 LowQD;LowPratio ABHet=0.3592;ABHom=0.9998;AC=11;AF=0.0004012;AN=27420;END=10177;MaxAAS=7;MaxAASR=0.4667;MaxAltPP=0;NHet=11;NHomAlt=0;NHomRef=13699;NUM_MERGED_SVS=1;PASS_AC=0;PASS_AN=26414;PASS_ratio=0.9633;QD=2.431;RefLen=1;SVLEN=66;SVMODEL=COVERAGE;SVSIZE=66;SVTYPE=DEL;SV_ID=2;SeqDepth=415257;VarType=DG
chr1 10344 chr1:10344:DG N <DEL:SVSIZE=146:AGGREGATED> 19983 LowQD;LowPratio ABHet=0.369;ABHom=0.9895;AC=544;AF=0.01991;AN=27324;END=10490;MaxAAS=30;MaxAASR=0.7857;MaxAltPP=0;NHet=542;NHomAlt=1;NHomRef=13167;NUM_MERGED_SVS=1;PASS_AC=0;PASS_AN=0;PASS_ratio=0;QD=4.467;RefLen=1;SVLEN=146;SVMODEL=AGGREGATED;SVSIZE=146;SVTYPE=DEL;SV_ID=3;SeqDepth=375179;VarType=DG
chr1 10344 chr1:10344:DG.0 N <DEL:SVSIZE=146:BREAKPOINT> 0 LowQD;LowQUAL;LowPratio ABHet=-1;ABHom=1;AC=0;AF=0;AN=906;END=10490;MaxAAS=0;MaxAASR=0;MaxAltPP=0;NHet=0;NHomAlt=0;NHomRef=13710;NUM_MERGED_SVS=1;PASS_AC=0;PASS_AN=0;PASS_ratio=0;QD=0;RefLen=1;SVLEN=146;SVMODEL=BREAKPOINT;SVSIZE=146;SVTYPE=DEL;SV_ID=3;SeqDepth=619;VarType=DG
chr1 10344 chr1:10344:DG.1 N <DEL:SVSIZE=146:COVERAGE> 19986 LowQD ABHet=0.3688;ABHom=0.9887;AC=545;AF=0.01988;AN=27420;END=10490;MaxAAS=30;MaxAASR=0.7857;MaxAltPP=0;NHet=543;NHomAlt=1;NHomRef=13166;NUM_MERGED_SVS=1;PASS_AC=261;PASS_AN=25714;PASS_ratio=0.9378;QD=4.462;RefLen=1;SVLEN=146;SVMODEL=COVERAGE;SVSIZE=146;SVTYPE=DEL;SV_ID=3;SeqDepth=376397;VarType=DG
##fileformat=VCFv4.2
##FILTER=<ID=PASS,Description="All filters passed">
##fileDate=20211111
##source=Graphtyper
##graphtyperVersion=2.7
##graphtyperGitBranch=master
##graphtyperSHA1=ffb1ab2bea804973f197784783707dd46d7d5087
##contig=<ID=chr1,length=248956422>
##contig=<ID=chr2,length=242193529>
##contig=<ID=chr3,length=198295559>
[farrell@scc-wr3 workarea.manta.adsp_pcr_free_1]$ zcat adsp_pcr_free_1.graphtyper.chr1.graphtyper.chr1.vcf.gz|cut -f1-8|grep -v ^#|head
gzip: adsp_pcr_free_1.graphtyper.chr1.graphtyper.chr1.vcf.gz: No such file or directory
[farrell@scc-wr3 workarea.manta.adsp_pcr_free_1]$ zcat adsp_pcr_free_1.graphtyper.chr1.vcf.gz|cut -f1-8|grep -v ^#|head
chr1 54719 chr1:54719:UG N <DUP:SVSIZE=55:AGGREGATED> 860885 PASS AAScore=0.2428;ABHet=0.5053;ABHetMulti=0.5053,0.4947;ABHom=0.9977;ABHomMulti=0.9995,0.9976;AC=8787;AF=0.9584;AN=9168;CR=0;CRal=0,0;CRalt=0;END=54720;LOGF=0.9797;MMal=383422,17016;MMalt=0.0126195;MQ=34;MQSal=155211233,2510500;MQalt=4;MQsquared=162917433;MaxAAS=93;MaxAASR=1;MaxAltPP=1;NHet=57;NHomAlt=4365;NHomRef=252;NUM_MERGED_SVS=956;PASS_AC=6187;PASS_AN=6448;PASS_ratio=0.7033;QD=20.04;QDalt=20.04;RELATED_SV_ID=1;RefLen=1;SB=0.4796;SBAlt=0.363;SBF=30338,253;SBF1=16583,140;SBF2=13755,113;SBR=32747,444;SBR1=16548,222;SBR2=16199,222;SDal=1317798,34678;SDalt=0.257181;SEQ=TCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTT;SVLEN=55;SVMODEL=AGGREGATED;SVSIZE=55;SVTYPE=DUP;SV_ID=0;SeqDepth=141743;VarType=UG
chr1 54719 chr1:54719:UG.0 N <DUP:SVSIZE=55:BREAKPOINT1> 2676 LowABHet;LowQD AAScore=0.8248;ABHet=0.1229;ABHetMulti=0.1229,0.8771;ABHom=0.9943;ABHomMulti=0.9943,0.5;AC=206;AF=0.02282;AN=9026;CR=0;CRal=0,0;CRalt=0;END=54720;LOGF=0.002512;MMal=383422,17016;MMalt=2.44132;MQ=50
Looking back at the documentation/code, it looks like the Graphtyper AAScore was originally intended just for SNPs and INDELs and therefore AASCore was intently dropped for SVs in recent versions. The training for the logistic regression model used in AAScore appears based on the GIAB SNV and INDELs. Did I get that right?
To help with SV QC, could a AAScore be developed for the SVs using the GIAB SVs as a training set also? Maybe consider a model for each variant type (SNV, DEL, DUP, INS) to increase the accuracy of the AAScore. For example, GATK pipeline runs two VQSR models trained on SNVs and small INDELs separately with different QC metrics.