gatk
gatk copied to clipboard
ApplyVQSR Exception thrown at a/some variation site(s) after INDEL VQSR
Bug Report
Affected tool(s) or class(es)
ApplyVQSR, org.broadinstitute.hellbender.exceptions.GATKException
Affected version(s)
4.5.0.0 (gatk-package-4.5.0.0-local.jar) HTSJDK Version: 4.1.0 Picard Version: 3.1.1
Description
It Encountered input variant which isn't found in the input recal file. And suggested to make sure VariantRecalibrator and ApplyVQSR were run on the same set of input variants. ### I'm sure with this.
Steps to reproduce
- step1. E_J_HC.2.vcf.gz as input variation to get EJ.HC.snps.recal file, using VariantRecalibrator in SNP mode
- step2. E_J_HC.2.vcf.gz as input variation and recal-file EJ.HC.snps.recal to get SNP recalibrated vcf file [EJ.HC.snps.VQSR.vcf.gz], using ApplyVQSR in SNP mode
- step3. EJ.HC.snps.VQSR.vcf.gz as input variation to get EJ.HC.snps.indels.recal file, using VariantRecalibrator in INDEL mode, with --max-gaussians 6; Tool returned: true
- step4. EJ.HC.snps.VQSR.vcf.gz as input variation and recal-file EJ.HC.snps.indels.recal to get INDEL recalibrated vcf file EJ.HC.vqsr.vcf.gz, using ApplyVQSR
Expected behavior
Give a recalibrated vcf file ( both SNPs and INDELs )
Actual behavior
Stopped at step4 at position 1:21246 when writing the recalibrated vcf file EJ.HC.vqsr.vcf.gz.
org.broadinstitute.hellbender.exceptions.GATKException: Exception thrown at 1:21246 [VC EJ.HC.snps.VQSR.vcf.gz @ 1:21246 Q2489.48 of type=SNP alleles=[G*, T] attr={AC=2, AF=0.500, AN=4, DP=155, ExcessHet=0.0000, FS=0.000, MLEAC=2, MLEAF=0.500, MQ=60.00, QD=27.97, SOR=1.204, VQSLOD=-3.754e-01, culprit=QD} GT=GT:AD:DP:GQ:PL 0/0:64,0:64:99:0,120,1800 1/1:0,89:89:99:2505,267,0 filters= at org.broadinstitute.hellbender.engine.MultiVariantWalker.lambda$traverse$1(MultiVariantWalker.java:145) at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183) at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:197) at java.base/java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:179) at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:197) at java.base/java.util.Iterator.forEachRemaining(Iterator.java:133) at java.base/java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1845) at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509) at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499) at java.base/java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150) at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173) at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) at java.base/java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:596) at org.broadinstitute.hellbender.engine.MultiVariantWalker.traverse(MultiVariantWalker.java:136) at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:1098) at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:149) at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:198) at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:217) at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:166) at org.broadinstitute.hellbender.Main.mainEntry(Main.java:209) at org.broadinstitute.hellbender.Main.main(Main.java:306) Caused by: org.broadinstitute.hellbender.exceptions.UserException: Encountered input variant which isn't found in the input recal file. Please make sure VariantRecalibrator and ApplyVQSR were run on the same set of input variants. First seen at: [VC EJ.HC.snps.VQSR.vcf.gz @ 1:21246 Q2489.48 of type=SNP alleles=[G*, T] attr={AC=2, AF=0.500, AN=4, DP=155, ExcessHet=0.0000, FS=0.000, MLEAC=2, MLEAF=0.500, MQ=60.00, QD=27.97, SOR=1.204, VQSLOD=-3.754e-01, culprit=QD} GT=GT:AD:DP:GQ:PL 0/0:64,0:64:99:0,120,1800 1/1:0,89:89:99:2505,267,0 filters=
what I checked:
-
I noticed there is a similar closed issue #8054 , but I didn't know how it was solved
-
it indeed has the SNP 1:21246 in EJ.HC.snps.VQSR.vcf.gz but not in the recal-file EJ.HC.snps.VQSR.vcf.gz
1 21246 . G T 2489.48 PASS AC=2;AF=0.5;AN=4;DP=155;ExcessHet=0;FS=0;MLEAC=2;MLEAF=0.5;MQ=60;QD=27.97;SOR=1.204;VQSLOD=-0.3754;culprit=QD GT:AD:DP:GQ:PL 0/0:64,0:64:99:0,120,1800 1/1:0,89:89:99:2505,267,0 -
I checked the code for the four steps,and I'm sure I used the same input variation file in VariantRecalibrator and ApplyVQSR. The code I used is shown below:
gatk --java-options "-Xmx800g" VariantRecalibrator \
-R ref.fa \
-V E_J_HC.2.vcf.gz \
-resource:conc,known=true,training=true,truth=true,prior=10.0 conc.snp.PASS.vcf.gz \
-an DP -an QD -an FS -an SOR -an ReadPosRankSum -an MQRankSum \
-mode SNP \
-tranche 100.0 -tranche 99.9 -tranche 99.0 -tranche 95.0 -tranche 90.0 \
--rscript-file EJ.HC.snps.plots.R \
--tranches-file EJ.HC.snps.tranches \
-O EJ.HC.snps.recal && \
gatk --java-options "-Xmx800g" ApplyVQSR \
-R ref.fa \
-V E_J_HC.2.vcf.gz \
--truth-sensitivity-filter-level 99.0 \
--tranches-file EJ.HC.snps.tranches \
--recal-file EJ.HC.snps.recal \
-mode SNP \
-O EJ.HC.snps.VQSR.vcf.gz
gatk --java-options "-Xmx800g" VariantRecalibrator \
-R ref.fa \
-V EJ.HC.snps.VQSR.vcf.gz \
-resource:conc,known=true,training=true,truth=true,prior=10.0 conc.indel.PASS.vcf.gz \
-an DP -an QD -an FS -an SOR -an ReadPosRankSum -an MQRankSum \
-mode INDEL \
--max-gaussians 6 \
--rscript-file EJ.HC.snps.indels.plots.R \
--tranches-file EJ.HC.snps.indels.tranches \
-O EJ.HC.snps.indels.recal && \
gatk --java-options "-Xmx800g" ApplyVQSR \
-R ref.fa \
-V EJ.HC.snps.VQSR.vcf.gz \
--truth-sensitivity-filter-level 99.0 \
--tranches-file EJ.HC.snps.indels.tranches \
--recal-file EJ.HC.snps.indels.recal \
-O EJ.HC.vqsr.vcf.gz