gatk icon indicating copy to clipboard operation
gatk copied to clipboard

ReblockGVCF fails by an exception:No shortest ALT at 4645646543 across alleles: [*]

Open maarten-k opened this issue 3 years ago • 3 comments

Bug Report

Affected tool(s) or class(es)

GATK ReblockGVCF

Affected version(s)

  • 4.2.5.0 and 4.2.6.1

Description

I am running ReblockGVCF on GVCF's that are haplotyped on version 4.0.1.4. About 1 out of 500 samples crash with the following error: ReblockGVCF fails by an exception:No shortest ALT at 464564654 across alleles: [*].

Complete error message:

org.broadinstitute.hellbender.exceptions.GATKException: Exception thrown at chr4::464564654[VC /bug.g.vcf.gz @ 

redacted


] filters=
        at org.broadinstitute.hellbender.engine.MultiVariantWalker.lambda$traverse$1(MultiVariantWalker.java:145)
        at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)
        at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:195)
        at java.base/java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:177)
        at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:195)
        at java.base/java.util.Iterator.forEachRemaining(Iterator.java:133)
        at java.base/java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
        at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484)
        at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474)
        at java.base/java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150)
        at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173)
        at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
        at java.base/java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:497)
        at org.broadinstitute.hellbender.engine.MultiVariantWalker.traverse(MultiVariantWalker.java:136)
        at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:1085)
        at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:140)
        at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:192)
        at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:211)
        at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
        at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
        at org.broadinstitute.hellbender.Main.main(Main.java:289)
Caused by: org.broadinstitute.hellbender.exceptions.GATKException: No shortest ALT at 464564654 across alleles: [*]
        at org.broadinstitute.hellbender.tools.walkers.variantutils.ReblockGVCF.addRefBlockIfNecessary(ReblockGVCF.java:632)
        at org.broadinstitute.hellbender.tools.walkers.variantutils.ReblockGVCF.cleanUpHighQualityVariant(ReblockGVCF.java:596)
        at org.broadinstitute.hellbender.tools.walkers.variantutils.ReblockGVCF.regenotypeVC(ReblockGVCF.java:347)
        at org.broadinstitute.hellbender.tools.walkers.variantutils.ReblockGVCF.apply(ReblockGVCF.java:273)
        at org.broadinstitute.hellbender.engine.MultiVariantWalker.lambda$traverse$1(MultiVariantWalker.java:139)
        ... 20 more

Steps to reproduce

gatk ReblockGVCF -R/hs38DH.fa -V bug.g.vcf.gz -O bug.rb.vcf.gz

(I generated a minimal example to reproduce the problem, but I am not sure I am allowed to publish this data in public, I can send it over, it's only 21KB)

Expected behaviour

A complete reblocked GVCF file.

Actual behavior

GATK crashed

maarten-k avatar Jul 22 '22 20:07 maarten-k

@maarten-k Can you please check whether there's a <NON_REF> allele present at the locus it's complaining about (464564654), in addition to the * allele?

Also, could you try re-generating your GVCFs with a more recent version of HaplotypeCaller? 4.0.1.4 is quite old at this point...

droazen avatar Jul 25 '22 19:07 droazen

Yes, there is 43 bases in front of this position a C,<NON_REF> where the REF is about 270 basepairs long.

Also, could you try re-generating your GVCFs with a more recent version of HaplotypeCaller? 4.0.1.4 is quite old at this point...

Yes, I know this is an old version, but I am at the end of finalising a 15.000+ WGS callset. So switching is not an easy solution for me. However, I will test also this with the newest version.

maarten-k avatar Jul 27 '22 20:07 maarten-k

I can confirm this is not the case anymore with GATK 4.2.6.1. Minor correction from my side: The GATK version should be 4.1.4.0 where the issue occurred.

Can you advise for a workaround on this? I can remove the problematic lines from the files with some basic command line tools, but if there is a more sophisticated way, please let me know.

maarten-k avatar Aug 06 '22 17:08 maarten-k