graphtyper icon indicating copy to clipboard operation
graphtyper copied to clipboard

INFO/AAScore using graphtyper 2.7.4 is missing

Open jjfarrell opened this issue 1 year ago • 1 comments

When running Graphtyper 2.7, the info field output included the AAScore field along with the other QC fields. However, when we ran the newer 2.74, the AAScore was not found in the INFO field even though it was described in the VCF Header. Any suggestions with this?

zgrep Vers adsp_13710_pcrfree.graphtyper.chr1.vcf.gz ##graphtyperVersion=2.7.4 ##bcftools_concatVersion=1.10.2+htslib-1.10.2

[farrell@scc-wr3 workarea.ont_manta.adsp_13710_pcrfree]$ zcat adsp_13710_pcrfree.graphtyper.chr1.vcf.gz|grep -v ^#|head|cut -f1-8

chr1    10001   chr1:10001:UG   N       <DUP:SVSIZE=196:AGGREGATED>     3233292 LowPratio       ABHet=0.5848;ABHom=0.9995;AC=27342;AF=0.9972;AN=27420;END=10197;MaxAAS=43;MaxAASR=1;MaxAltPP=0;NHet=32;NHomAlt=13655;NHomRef=23;NUM_MERGED_SVS=3;PASS_AC=0;PASS_AN=0;PASS_ratio=0;QD=23.33;RELATED_SV_ID=1;RefLen=1;SEQ=AACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCCTAACCCTAACCCTAACCCTAACCCTAACCTAACCCTAACCCTAACCC;SVLEN=196;SVMODEL=AGGREGATED;SVSIZE=196;SVTYPE=DUP;SV_ID=0;SeqDepth=316675;VarType=UG
chr1    10001   chr1:10001:UG.0 N       <DUP:SVSIZE=196:BREAKPOINT1>    36613   LowPratio       ABHet=0.3333;ABHom=0.986;AC=2808;AF=0.7883;AN=3562;END=10197;MaxAAS=4;MaxAASR=1;MaxAltPP=4;NHet=20;NHomAlt=1394;NHomRef=12296;NUM_MERGED_SVS=3;PASS_AC=0;PASS_AN=0;PASS_ratio=0;QD=21.88;RELATED_SV_ID=1;RefLen=1;SEQ=AACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCCTAACCCTAACCCTAACCCTAACCCTAACCTAACCCTAACCCTAACCC;SVLEN=196;SVMODEL=BREAKPOINT1;SVSIZE=196;SVTYPE=DUP;SV_ID=0;SeqDepth=2285;VarType=UG
chr1    10001   chr1:10001:UG.1 N       <DUP:SVSIZE=196:BREAKPOINT2>    3106    LowPratio       ABHet=0.2033;ABHom=0.9994;AC=231;AF=0.008739;AN=26434;END=10197;MaxAAS=3;MaxAASR=1;MaxAltPP=3;NHet=159;NHomAlt=36;NHomRef=13515;NUM_MERGED_SVS=3;PASS_AC=0;PASS_AN=0;PASS_ratio=0;QD=14.58;RELATED_SV_ID=0;RefLen=1;SEQ=AACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCCTAACCCTAACCCTAACCCTAACCCTAACCTAACCCTAACCCTAACCC;SVLEN=196;SVMODEL=BREAKPOINT2;SVSIZE=196;SVTYPE=DUP;SV_ID=1;SeqDepth=50612;VarType=UG
chr1    10001   chr1:10001:UG.2 N       <DUP:SVSIZE=196:COVERAGE>       3235545 PASS    ABHet=0.6227;ABHom=0.9994;AC=27373;AF=0.9983;AN=27420;END=10197;MaxAAS=43;MaxAASR=1;MaxAltPP=0;NHet=47;NHomAlt=13663;NHomRef=0;NUM_MERGED_SVS=3;PASS_AC=27127;PASS_AN=27142;PASS_ratio=0.9899;QD=23.31;RELATED_SV_ID=1;RefLen=1;SEQ=AACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCCTAACCCTAACCCTAACCCTAACCCTAACCTAACCCTAACCCTAACCC;SVLEN=196;SVMODEL=COVERAGE;SVSIZE=196;SVTYPE=DUP;SV_ID=0;SeqDepth=316924;VarType=UG
chr1    10111   chr1:10111:DG   N       <DEL:SVSIZE=66:AGGREGATED>      731     LowPratio       ABHet=0.4095;ABHom=0.9999;AC=33;AF=0.001204;AN=27420;END=10177;MaxAAS=9;MaxAASR=1;MaxAltPP=9;NHet=9;NHomAlt=12;NHomRef=13689;NUM_MERGED_SVS=1;PASS_AC=0;PASS_AN=0;PASS_ratio=0;QD=8.807;RefLen=1;SVLEN=66;SVMODEL=AGGREGATED;SVSIZE=66;SVTYPE=DEL;SV_ID=2;SeqDepth=415129;VarType=DG
chr1    10111   chr1:10111:DG.0 N       <DEL:SVSIZE=66:BREAKPOINT>      251590  LowPratio       ABHet=0.611;ABHom=0.9987;AC=16742;AF=0.9624;AN=17396;END=10177;MaxAAS=9;MaxAASR=1;MaxAltPP=9;NHet=336;NHomAlt=8203;NHomRef=5171;NUM_MERGED_SVS=1;PASS_AC=3;PASS_AN=6;PASS_ratio=0.0003449;QD=13.97;RefLen=1;SVLEN=66;SVMODEL=BREAKPOINT;SVSIZE=66;SVTYPE=DEL;SV_ID=2;SeqDepth=18862;VarType=DG
chr1    10111   chr1:10111:DG.1 N       <DEL:SVSIZE=66:COVERAGE>        124     LowQD;LowPratio ABHet=0.3592;ABHom=0.9998;AC=11;AF=0.0004012;AN=27420;END=10177;MaxAAS=7;MaxAASR=0.4667;MaxAltPP=0;NHet=11;NHomAlt=0;NHomRef=13699;NUM_MERGED_SVS=1;PASS_AC=0;PASS_AN=26414;PASS_ratio=0.9633;QD=2.431;RefLen=1;SVLEN=66;SVMODEL=COVERAGE;SVSIZE=66;SVTYPE=DEL;SV_ID=2;SeqDepth=415257;VarType=DG
chr1    10344   chr1:10344:DG   N       <DEL:SVSIZE=146:AGGREGATED>     19983   LowQD;LowPratio ABHet=0.369;ABHom=0.9895;AC=544;AF=0.01991;AN=27324;END=10490;MaxAAS=30;MaxAASR=0.7857;MaxAltPP=0;NHet=542;NHomAlt=1;NHomRef=13167;NUM_MERGED_SVS=1;PASS_AC=0;PASS_AN=0;PASS_ratio=0;QD=4.467;RefLen=1;SVLEN=146;SVMODEL=AGGREGATED;SVSIZE=146;SVTYPE=DEL;SV_ID=3;SeqDepth=375179;VarType=DG
chr1    10344   chr1:10344:DG.0 N       <DEL:SVSIZE=146:BREAKPOINT>     0       LowQD;LowQUAL;LowPratio ABHet=-1;ABHom=1;AC=0;AF=0;AN=906;END=10490;MaxAAS=0;MaxAASR=0;MaxAltPP=0;NHet=0;NHomAlt=0;NHomRef=13710;NUM_MERGED_SVS=1;PASS_AC=0;PASS_AN=0;PASS_ratio=0;QD=0;RefLen=1;SVLEN=146;SVMODEL=BREAKPOINT;SVSIZE=146;SVTYPE=DEL;SV_ID=3;SeqDepth=619;VarType=DG
chr1    10344   chr1:10344:DG.1 N       <DEL:SVSIZE=146:COVERAGE>       19986   LowQD   ABHet=0.3688;ABHom=0.9887;AC=545;AF=0.01988;AN=27420;END=10490;MaxAAS=30;MaxAASR=0.7857;MaxAltPP=0;NHet=543;NHomAlt=1;NHomRef=13166;NUM_MERGED_SVS=1;PASS_AC=261;PASS_AN=25714;PASS_ratio=0.9378;QD=4.462;RefLen=1;SVLEN=146;SVMODEL=COVERAGE;SVSIZE=146;SVTYPE=DEL;SV_ID=3;SeqDepth=376397;VarType=DG
##fileformat=VCFv4.2
##FILTER=<ID=PASS,Description="All filters passed">
##fileDate=20211111
##source=Graphtyper
##graphtyperVersion=2.7
##graphtyperGitBranch=master
##graphtyperSHA1=ffb1ab2bea804973f197784783707dd46d7d5087
##contig=<ID=chr1,length=248956422>
##contig=<ID=chr2,length=242193529>
##contig=<ID=chr3,length=198295559>
[farrell@scc-wr3 workarea.manta.adsp_pcr_free_1]$ zcat   adsp_pcr_free_1.graphtyper.chr1.graphtyper.chr1.vcf.gz|cut -f1-8|grep -v ^#|head
gzip: adsp_pcr_free_1.graphtyper.chr1.graphtyper.chr1.vcf.gz: No such file or directory
[farrell@scc-wr3 workarea.manta.adsp_pcr_free_1]$ zcat   adsp_pcr_free_1.graphtyper.chr1.vcf.gz|cut -f1-8|grep -v ^#|head
chr1    54719   chr1:54719:UG   N       <DUP:SVSIZE=55:AGGREGATED>      860885  PASS    AAScore=0.2428;ABHet=0.5053;ABHetMulti=0.5053,0.4947;ABHom=0.9977;ABHomMulti=0.9995,0.9976;AC=8787;AF=0.9584;AN=9168;CR=0;CRal=0,0;CRalt=0;END=54720;LOGF=0.9797;MMal=383422,17016;MMalt=0.0126195;MQ=34;MQSal=155211233,2510500;MQalt=4;MQsquared=162917433;MaxAAS=93;MaxAASR=1;MaxAltPP=1;NHet=57;NHomAlt=4365;NHomRef=252;NUM_MERGED_SVS=956;PASS_AC=6187;PASS_AN=6448;PASS_ratio=0.7033;QD=20.04;QDalt=20.04;RELATED_SV_ID=1;RefLen=1;SB=0.4796;SBAlt=0.363;SBF=30338,253;SBF1=16583,140;SBF2=13755,113;SBR=32747,444;SBR1=16548,222;SBR2=16199,222;SDal=1317798,34678;SDalt=0.257181;SEQ=TCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTT;SVLEN=55;SVMODEL=AGGREGATED;SVSIZE=55;SVTYPE=DUP;SV_ID=0;SeqDepth=141743;VarType=UG
chr1    54719   chr1:54719:UG.0 N       <DUP:SVSIZE=55:BREAKPOINT1>     2676    LowABHet;LowQD  AAScore=0.8248;ABHet=0.1229;ABHetMulti=0.1229,0.8771;ABHom=0.9943;ABHomMulti=0.9943,0.5;AC=206;AF=0.02282;AN=9026;CR=0;CRal=0,0;CRalt=0;END=54720;LOGF=0.002512;MMal=383422,17016;MMalt=2.44132;MQ=50

jjfarrell avatar Aug 17 '22 21:08 jjfarrell

Looking back at the documentation/code, it looks like the Graphtyper AAScore was originally intended just for SNPs and INDELs and therefore AASCore was intently dropped for SVs in recent versions. The training for the logistic regression model used in AAScore appears based on the GIAB SNV and INDELs. Did I get that right?

To help with SV QC, could a AAScore be developed for the SVs using the GIAB SVs as a training set also? Maybe consider a model for each variant type (SNV, DEL, DUP, INS) to increase the accuracy of the AAScore. For example, GATK pipeline runs two VQSR models trained on SNVs and small INDELs separately with different QC metrics.

jjfarrell avatar Aug 18 '22 11:08 jjfarrell