gatk icon indicating copy to clipboard operation
gatk copied to clipboard

ApplyVQSR Exception thrown at a/some variation site(s) after INDEL VQSR

Open Dkyuan opened this issue 1 year ago • 0 comments

Bug Report

Affected tool(s) or class(es)

ApplyVQSR, org.broadinstitute.hellbender.exceptions.GATKException

Affected version(s)

4.5.0.0 (gatk-package-4.5.0.0-local.jar) HTSJDK Version: 4.1.0 Picard Version: 3.1.1

Description

It Encountered input variant which isn't found in the input recal file. And suggested to make sure VariantRecalibrator and ApplyVQSR were run on the same set of input variants. ### I'm sure with this.

Steps to reproduce

  1. step1. E_J_HC.2.vcf.gz as input variation to get EJ.HC.snps.recal file, using VariantRecalibrator in SNP mode
  2. step2. E_J_HC.2.vcf.gz as input variation and recal-file EJ.HC.snps.recal to get SNP recalibrated vcf file [EJ.HC.snps.VQSR.vcf.gz], using ApplyVQSR in SNP mode
  3. step3. EJ.HC.snps.VQSR.vcf.gz as input variation to get EJ.HC.snps.indels.recal file, using VariantRecalibrator in INDEL mode, with --max-gaussians 6; Tool returned: true
  4. step4. EJ.HC.snps.VQSR.vcf.gz as input variation and recal-file EJ.HC.snps.indels.recal to get INDEL recalibrated vcf file EJ.HC.vqsr.vcf.gz, using ApplyVQSR

Expected behavior

Give a recalibrated vcf file ( both SNPs and INDELs )

Actual behavior

Stopped at step4 at position 1:21246 when writing the recalibrated vcf file EJ.HC.vqsr.vcf.gz. org.broadinstitute.hellbender.exceptions.GATKException: Exception thrown at 1:21246 [VC EJ.HC.snps.VQSR.vcf.gz @ 1:21246 Q2489.48 of type=SNP alleles=[G*, T] attr={AC=2, AF=0.500, AN=4, DP=155, ExcessHet=0.0000, FS=0.000, MLEAC=2, MLEAF=0.500, MQ=60.00, QD=27.97, SOR=1.204, VQSLOD=-3.754e-01, culprit=QD} GT=GT:AD:DP:GQ:PL 0/0:64,0:64:99:0,120,1800 1/1:0,89:89:99:2505,267,0 filters= at org.broadinstitute.hellbender.engine.MultiVariantWalker.lambda$traverse$1(MultiVariantWalker.java:145) at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183) at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:197) at java.base/java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:179) at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:197) at java.base/java.util.Iterator.forEachRemaining(Iterator.java:133) at java.base/java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1845) at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509) at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499) at java.base/java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150) at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173) at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) at java.base/java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:596) at org.broadinstitute.hellbender.engine.MultiVariantWalker.traverse(MultiVariantWalker.java:136) at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:1098) at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:149) at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:198) at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:217) at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:166) at org.broadinstitute.hellbender.Main.mainEntry(Main.java:209) at org.broadinstitute.hellbender.Main.main(Main.java:306) Caused by: org.broadinstitute.hellbender.exceptions.UserException: Encountered input variant which isn't found in the input recal file. Please make sure VariantRecalibrator and ApplyVQSR were run on the same set of input variants. First seen at: [VC EJ.HC.snps.VQSR.vcf.gz @ 1:21246 Q2489.48 of type=SNP alleles=[G*, T] attr={AC=2, AF=0.500, AN=4, DP=155, ExcessHet=0.0000, FS=0.000, MLEAC=2, MLEAF=0.500, MQ=60.00, QD=27.97, SOR=1.204, VQSLOD=-3.754e-01, culprit=QD} GT=GT:AD:DP:GQ:PL 0/0:64,0:64:99:0,120,1800 1/1:0,89:89:99:2505,267,0 filters=


what I checked:

  1. I noticed there is a similar closed issue #8054 , but I didn't know how it was solved

  2. it indeed has the SNP 1:21246 in EJ.HC.snps.VQSR.vcf.gz but not in the recal-file EJ.HC.snps.VQSR.vcf.gz 1 21246 . G T 2489.48 PASS AC=2;AF=0.5;AN=4;DP=155;ExcessHet=0;FS=0;MLEAC=2;MLEAF=0.5;MQ=60;QD=27.97;SOR=1.204;VQSLOD=-0.3754;culprit=QD GT:AD:DP:GQ:PL 0/0:64,0:64:99:0,120,1800 1/1:0,89:89:99:2505,267,0

  3. I checked the code for the four steps,and I'm sure I used the same input variation file in VariantRecalibrator and ApplyVQSR. The code I used is shown below:

gatk --java-options "-Xmx800g" VariantRecalibrator \
       -R ref.fa \
       -V E_J_HC.2.vcf.gz \
       -resource:conc,known=true,training=true,truth=true,prior=10.0 conc.snp.PASS.vcf.gz \
       -an DP -an QD -an FS -an SOR -an ReadPosRankSum -an MQRankSum \
       -mode SNP \
       -tranche 100.0 -tranche 99.9 -tranche 99.0 -tranche 95.0 -tranche 90.0 \
       --rscript-file EJ.HC.snps.plots.R \
       --tranches-file EJ.HC.snps.tranches \
       -O EJ.HC.snps.recal && \
gatk --java-options "-Xmx800g" ApplyVQSR \
        -R ref.fa \
        -V E_J_HC.2.vcf.gz \
        --truth-sensitivity-filter-level 99.0 \
        --tranches-file EJ.HC.snps.tranches \
        --recal-file EJ.HC.snps.recal \
        -mode SNP \
        -O EJ.HC.snps.VQSR.vcf.gz

gatk --java-options "-Xmx800g" VariantRecalibrator \
        -R ref.fa \
        -V EJ.HC.snps.VQSR.vcf.gz \
        -resource:conc,known=true,training=true,truth=true,prior=10.0 conc.indel.PASS.vcf.gz \
        -an DP -an QD -an FS -an SOR -an ReadPosRankSum -an MQRankSum \
        -mode INDEL \
        --max-gaussians 6 \
        --rscript-file EJ.HC.snps.indels.plots.R \
        --tranches-file EJ.HC.snps.indels.tranches \
        -O EJ.HC.snps.indels.recal && \
gatk --java-options "-Xmx800g" ApplyVQSR \
        -R ref.fa \
        -V EJ.HC.snps.VQSR.vcf.gz \
        --truth-sensitivity-filter-level 99.0 \
        --tranches-file EJ.HC.snps.indels.tranches \
        --recal-file EJ.HC.snps.indels.recal \
        -O EJ.HC.vqsr.vcf.gz

Dkyuan avatar Apr 09 '24 04:04 Dkyuan