gatk icon indicating copy to clipboard operation
gatk copied to clipboard

Funcotator - logic error for large deletions overlapping 5'UTR

Open shenkers opened this issue 5 years ago • 6 comments

Bug Report

Affected tool(s) or class(es)

Funcotator

Affected version(s)

  • [x] Latest public release version [version v4.1.4.1]
  • [ ] Latest master branch as of [date of test?]

Description

Hi @jonn-smith , I saw you often address Funcotator related issues, so I thought this might be of interest to you.

I ran funcotator on a vcf created by mutect2 from RNA-seq data. The vcf includes a large deletion in the GABARAP gene, and when Funcotator processes this annotation, it dies with an error about a query that extends past the end of a contig:

htsjdk.samtools.SAMException: Query asks for data past end of contig. Query contig ENST00000571253.1|ENS G00000170296.9|OTTHUMG00000102156.3|OTTHUMT00000440082.2|AC120057.8-003|GABARAP|837|UTR5:1-753|CDS:754-8 37| start:1 stop:895 contigLength:837 at org.broadinstitute.hellbender.utils.fasta.CachingIndexedFastaSequenceFile.getSubsequenceAt(Ca chingIndexedFastaSequenceFile.java:316) at org.broadinstitute.hellbender.engine.ReferenceFileSource.queryAndPrefetch(ReferenceFileSource .java:78) at org.broadinstitute.hellbender.engine.ReferenceDataSource.queryAndPrefetch(ReferenceDataSource .java:64) at org.broadinstitute.hellbender.tools.funcotator.dataSources.gencode.GencodeFuncotationFactory. getFivePrimeUtrSequenceFromTranscriptFasta(GencodeFuncotationFactory.java:744) at org.broadinstitute.hellbender.tools.funcotator.dataSources.gencode.GencodeFuncotationFactory.createUtrFuncotation(GencodeFuncotationFactory.java:1568) at org.broadinstitute.hellbender.tools.funcotator.dataSources.gencode.GencodeFuncotationFactory.createGencodeFuncotationOnSingleTranscript(GencodeFuncotationFactory.java:983) at org.broadinstitute.hellbender.tools.funcotator.dataSources.gencode.GencodeFuncotationFactory.createFuncotationsHelper(GencodeFuncotationFactory.java:805) at org.broadinstitute.hellbender.tools.funcotator.dataSources.gencode.GencodeFuncotationFactory.createFuncotationsHelper(GencodeFuncotationFactory.java:789)

the deletion that is causing the error is 141 base pairs, and I noticed the length of the contig Funcotator is trying to retrieve (895) is equal to the UTR length + deletion length + 1, 753 + 141 + 1. When I looked at the source code around where the error occurs, I see where the length of the retrieved interval is defined (line 738):

       final SimpleInterval transcriptInterval = new SimpleInterval(
              transcriptMapIdAndMetadata.mapKey,
              transcriptMapIdAndMetadata.fivePrimeUtrStart,
             transcriptMapIdAndMetadata.fivePrimeUtrEnd + extraBases
     );

and the logic for how large that extraBases should be (line 1566):

final int numExtraTrailingBases = variant.getReference().length() < defaultNumTrail ingBasesForUtrAnnotationSequenceConstruction ? defaultNumTrailingBasesForUtrAnnotationSequenceConst ruction : variant.getReference().length() + 1;

I believe line 1566 is the source of the problem; there is no check that UTR-end + deletion length extends past the end of the transcript.

Steps to reproduce

download funcotator_dataSources.v1.6.20190124s from Broad FTP server

run funcotator using:

Funcotator -R /tmp/GRCh38.fa -V broken.vcf -O broken.out.vcf --data-sources-path funcotator_dataSources.v1.6.20190124s/ --output-file-format VCF --ref-version hg38

on a vcf with a single variant:

chr17 7241460 . ACTGCAAAAGATACAAGATGCAAGAAAGTCACAGAGGTCAAAAATGCCCTCAAAAGAACAGCTGCTAGGTGGAGCCTCCTCCCGCAGAGACTGCACTCCCACCCACAGGAAGCAAGCCTGAGTCTTGGATCAGGTTCCCAC A .

Expected behavior

Funcotator should not attempt to retrieve a sequence that extends past the end of a transcript

Actual behavior

Funcotator crashes because it attempts to retrieve sequence past the end of a transcript

shenkers avatar Jan 03 '20 03:01 shenkers

Thanks for the report!

This is either the same as or related to issue #4307 - so it's on my/our radar.

jonn-smith avatar Jan 06 '20 17:01 jonn-smith

Alright, glad to hear. Do you take pull requests for this sort of thing? If you have a better idea of how/where the code needs to be changed, i would be happy to take a whack at fixing it.

shenkers avatar Jan 06 '20 18:01 shenkers

@shenkers Yes, the GATK team is always willing to review PRs from external contributors!

davidbenjamin avatar Feb 12 '20 17:02 davidbenjamin

@davidbenjamin @jonn-smith , I created a pull request that addresses this issue #6546, I'm happy to make any changes, if you have comments/suggestions.

shenkers avatar Apr 08 '20 18:04 shenkers

Hi there,

I got a very similar error on the GABARAP gene.

GATK version 4.3.0.0. I am running Funcotator on a vcf from illumina dragen.

htsjdk.samtools.SAMException: Query asks for data past end of contig. Query contig ENST00000571253.1|ENSG00000170296.10|OTTHUMG00000102156.4|OTTHUMT00000440082.2|GABARAP-204|GABARAP|837|UTR5:1-753|CDS:754-837| start:1 stop:854 contigLength:837
        at org.broadinstitute.hellbender.utils.fasta.CachingIndexedFastaSequenceFile.getSubsequenceAt(CachingIndexedFastaSequenceFile.java:330)
        at org.broadinstitute.hellbender.engine.ReferenceFileSource.queryAndPrefetch(ReferenceFileSource.java:78)
        at org.broadinstitute.hellbender.engine.ReferenceDataSource.queryAndPrefetch(ReferenceDataSource.java:64)
        at org.broadinstitute.hellbender.tools.funcotator.dataSources.gencode.GencodeFuncotationFactory.getFivePrimeUtrSequenceFromTranscriptFasta(GencodeFuncotationFactory.java:778)
        at org.broadinstitute.hellbender.tools.funcotator.dataSources.gencode.GencodeFuncotationFactory.createUtrFuncotation(GencodeFuncotationFactory.java:1619)
        at org.broadinstitute.hellbender.tools.funcotator.dataSources.gencode.GencodeFuncotationFactory.createGencodeFuncotationOnSingleTranscript(GencodeFuncotationFactory.java:1025)
        at org.broadinstitute.hellbender.tools.funcotator.dataSources.gencode.GencodeFuncotationFactory.createFuncotationsHelper(GencodeFuncotationFactory.java:847)
        at org.broadinstitute.hellbender.tools.funcotator.dataSources.gencode.GencodeFuncotationFactory.createFuncotationsHelper(GencodeFuncotationFactory.java:831)
        at org.broadinstitute.hellbender.tools.funcotator.dataSources.gencode.GencodeFuncotationFactory.lambda$createGencodeFuncotationsByAllTranscripts$0(GencodeFuncotationFactory.java:508)
        at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:195)
        at java.base/java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1655)
        at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484)
        at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474)
        at java.base/java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:913)
        at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
        at java.base/java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:578)
        at org.broadinstitute.hellbender.tools.funcotator.dataSources.gencode.GencodeFuncotationFactory.createGencodeFuncotationsByAllTranscripts(GencodeFuncotationFactory.java:509)
        at org.broadinstitute.hellbender.tools.funcotator.dataSources.gencode.GencodeFuncotationFactory.createFuncotationsOnVariant(GencodeFuncotationFactory.java:564)
        at org.broadinstitute.hellbender.tools.funcotator.DataSourceFuncotationFactory.determineFuncotations(DataSourceFuncotationFactory.java:243)
        at org.broadinstitute.hellbender.tools.funcotator.DataSourceFuncotationFactory.createFuncotations(DataSourceFuncotationFactory.java:211)
        at org.broadinstitute.hellbender.tools.funcotator.DataSourceFuncotationFactory.createFuncotations(DataSourceFuncotationFactory.java:182)
        at org.broadinstitute.hellbender.tools.funcotator.FuncotatorEngine.lambda$createFuncotationMapForVariant$0(FuncotatorEngine.java:152)
        at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:195)
        at java.base/java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:177)
        at java.base/java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1655)
        at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484)
        at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474)
        at java.base/java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:913)
        at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
        at java.base/java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:578)
        at org.broadinstitute.hellbender.tools.funcotator.FuncotatorEngine.createFuncotationMapForVariant(FuncotatorEngine.java:162)
        at org.broadinstitute.hellbender.tools.funcotator.Funcotator.enqueueAndHandleVariant(Funcotator.java:924)
        at org.broadinstitute.hellbender.tools.funcotator.Funcotator.apply(Funcotator.java:878)
        at org.broadinstitute.hellbender.engine.VariantWalker.lambda$traverse$0(VariantWalker.java:104)
        at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)
        at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:195)
        at java.base/java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:177)
        at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:195)
        at java.base/java.util.Iterator.forEachRemaining(Iterator.java:133)
        at java.base/java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
        at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484)
        at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474)
        at java.base/java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150)
        at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173)
        at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
        at java.base/java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:497)
        at org.broadinstitute.hellbender.engine.VariantWalker.traverse(VariantWalker.java:102)
        at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:1095)
        at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:140)
        at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:192)
        at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:211)
        at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
        at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
        at org.broadinstitute.hellbender.Main.main(Main.java:289)

zhanyinx avatar Aug 28 '23 10:08 zhanyinx