truvari
truvari copied to clipboard
Disparity between bench, phab and ga4gh query vcfs?
A variant shows up in the original fp.vcf.gz, and shows up in phab.output.vcf.gz, but does not show up in phab_bench/tp-comp.vcf.gz or phab_bench/fp.vcf.gz. This variant then does not show up at all in the ga4gh output when using the --with-refine flag.
Truvari v4.2.2.
Commands used (file names simplified)
truvari bench -b truth.vcf.gz -c dysgu.vcf.gz -o dysgu --pctseq 0 --typeignore --passonly --includebed truthbed.bed
truvari refine --threads 4 -R -U -f reference.fasta --align wfa --regions dysgu/candidate.refine.bed dysgu
truvari ga4gh -i dysgu -w -o ga4gh-dysgu
The variant in fp.vcf.gz:
chrX 18868815 498284 C <INS> . PASS SVMETHOD=DYSGUv1.6.2;SVTYPE=INS;END=18868815;CHR2=chrX;GRP=498284;NGRP=1;CT=5to3;CIPOS95=0;CIEND95=0;SVLEN=385;CONTIGA=tgtttttgcccccggagtcttgctctgtcgcccaggctggagtgcggtggcgcCATCTCGGCTCACTGCAAGCTCCGCCTCCCGGGTTCACGCCATTCTCCTGCCTCAGTCTGCCCAGTAGCTGGGACTATAGGCGCCCACCACCATGCCTGGCTAATTTTTTTGTATTTTTAGTAGAGATTGGG;LEFT_SVINSSEQ=tgtttttgcccccggagtcttgctctgtcgcccaggctggagtgcggtggcgc;KIND=extra-regional;GC=57.84;NEXP=0;STRIDE=0;EXPSEQ;RPOLY=14;OL=0;SU=13;WR=0;PE=0;SR=0;SC=13;BND=13;LPREC=0;RT=pe;PctSeqSimilarity=0;PctSizeSimilarity=0.3247;PctRecOverlap=0.3247;SizeDiff=-260;StartDistance=-1;EndDistance=-1;GTMatch=0;TruScore=21;MatchId=10211.0,10211.0 GT:GQ:MAPQP:SU:WR:PE:SR:SC:BND:COV:NEIGH10:PS:MS:RMS:RED:BCC:FCC:ICN:OCN:PROB 1/1:91:40.62:13:0:0:0:13:13:29.54:3:7:6:0:0:24:1.192:0.963:1.148:0.614
The variant(s) in phab.output.vcf.gz:
chrX 18868814 . A AAATTCTCTTCTGATTTCTGTTGGTTTATTATTTTGCTGTTTTTTTCCCTAATTTCTTTTTTTTTTTTTTTTTTTTTTTTTGAGACGGAGTCTTGCTCTGTCGCCCAGGCTGGAGTGCGGTGGCGC . . . GT 1/1 0/0
chrX 18868815 . C <INS> . . . GT 0/0 1/1
The region is also in the candidate bed file chrX 18868803 18868825.
chrX 18868814 is the "expected" position for this variant, and logic tells me that the refined version would be classifying this as a TP. But I can't find it in ga4gh at all. The variant in the GIAB HG002 truth set is:
chrX 18868814 HG3_PB_assemblyticsfalcon_28664 A AAATTCTCTTCTGATTTCTGTTGGTTTATTATTTTGCTGTTTTTTTCCCTAATTTCTTTTTTTTTTTTTTTTTTTTTTTTTGAGACGGAGTCTTGCTCTGTCGCCCAGGCTGGAGTGCGGTGGCGC 20 PASS BREAKSIMLENGTH=0;CGcalls=0;CGexactcalls=0;ClusterIDs=HG2_PB_pbsv_20069:HG2_PB_HySA_36018:HG4_PB_pbsv_20008:HG4_PB_HySA_28613:HG2_PB_SVrefine2Falcon2Bionano_9967:HG4_Ill_GATKHCSBGrefine_13354:HG3_PB_assemblyticsfalcon_28664:HG3_PB_SVrefine2Falcon1Dovetail_12497:HG2_PB_assemblyticsPBcR_27931:HG2_PB_SVrefine2PBcRplusDovetail_12040:HG2_PB_assemblyticsfalcon_28686:HG2_PB_SVrefine2Falcon1plusDovetail_12867:HG4_PB_assemblyticsfalcon_28135:HG4_PB_SVrefine2Falcon1Dovetail_12510:HG4_PB_SVrefine2PBcRDovetail_10079;ClusterMaxEditDist=0.200772;ClusterMaxShiftDist=0.0705882;ClusterMaxSizeDiff=0.054902;DistBack=11492;DistForward=-1;DistMin=-1;DistMinlt1000=TRUE;DistPASSHG2gt49Minlt1000=FALSE;DistPASSMinlt1000=FALSE;END=18868814;ExactMatchIDs=HG2_PB_SVrefine2PBcRplusDovetail_12040:HG2_PB_assemblyticsPBcR_27931:HG3_PB_SVrefine2Falcon1Dovetail_12497:HG3_PB_assemblyticsfalcon_28664:HG4_Ill_GATKHCSBGrefine_13354;HG003_GT=0/0;HG004_GT=1/1;HG2count=7;HG3count=2;HG4count=6;Illcalls=1;Illexactcalls=1;MendelianError=TRUE;MultiTech=TRUE;MultiTechExact=TRUE;NumClusterSVs=15;NumExactMatchSVs=5;NumTechs=2;NumTechsExact=2;PBcalls=14;PBexactcalls=4;REFWIDENED=X:18886933-18886932;REPTYPE=SIMPLEINS;SVLEN=125;SVTYPE=INS;TRall=FALSE;TRgt100=FALSE;TRgt10k=FALSE;TenXcalls=0;TenXexactcalls=0;segdup=FALSE;sizecat=100to299 GT:GTcons1:PB_GT:PB_REF:PB_ALT:PBHP_GT:PB_REF_HP1:PB_ALT_HP1:PB_REF_HP2:PB_ALT_HP2:TenX_GT:TenX_REF_HP1:TenX_ALT_HP1:TenX_REF_HP2:TenX_ALT_HP2:ILL250bp_GT:ILL250bp_REF:ILL250bp_ALT:ILLMP_GT:ILLMP_REF:ILLMP_ALT:BNG_LEN_DEL:BNG_LEN_INS:nabsys_svm 1/1:1/1:1/1:0:35:./.:0:0:0:0:./.:0:0:0:0:1/1:0:12:./.:.:.:.:.:.
My understanding of the behaviour is that since this variant is being harmonized, it should be making it to either the phab_bench/tp-comp.vcf.gz or the phab_bench/fp.vcf.gz files and into ga4gh's query vcf, so I am confused why this is not the case.
Hello,
Before I dig into this, could you confirm this behavior is present in Truvari v4.3.1? There were recent changes to the within/outside boundaries logic that may change what you're seeing. If you do still see the error, could you also include the truth.vcf.gz, dysgu.vcf.gz, and truthbed.bed subset to your example region so that I can fully debug this?
Thanks, ~/Adam
Please reopen if there's any updates.