gatk
gatk copied to clipboard
Funcotator - logic error for large deletions overlapping 5'UTR
Bug Report
Affected tool(s) or class(es)
Funcotator
Affected version(s)
- [x] Latest public release version [version v4.1.4.1]
- [ ] Latest master branch as of [date of test?]
Description
Hi @jonn-smith , I saw you often address Funcotator related issues, so I thought this might be of interest to you.
I ran funcotator on a vcf created by mutect2 from RNA-seq data. The vcf includes a large deletion in the GABARAP gene, and when Funcotator processes this annotation, it dies with an error about a query that extends past the end of a contig:
htsjdk.samtools.SAMException: Query asks for data past end of contig. Query contig ENST00000571253.1|ENS G00000170296.9|OTTHUMG00000102156.3|OTTHUMT00000440082.2|AC120057.8-003|GABARAP|837|UTR5:1-753|CDS:754-8 37| start:1 stop:895 contigLength:837 at org.broadinstitute.hellbender.utils.fasta.CachingIndexedFastaSequenceFile.getSubsequenceAt(Ca chingIndexedFastaSequenceFile.java:316) at org.broadinstitute.hellbender.engine.ReferenceFileSource.queryAndPrefetch(ReferenceFileSource .java:78) at org.broadinstitute.hellbender.engine.ReferenceDataSource.queryAndPrefetch(ReferenceDataSource .java:64) at org.broadinstitute.hellbender.tools.funcotator.dataSources.gencode.GencodeFuncotationFactory. getFivePrimeUtrSequenceFromTranscriptFasta(GencodeFuncotationFactory.java:744) at org.broadinstitute.hellbender.tools.funcotator.dataSources.gencode.GencodeFuncotationFactory.createUtrFuncotation(GencodeFuncotationFactory.java:1568) at org.broadinstitute.hellbender.tools.funcotator.dataSources.gencode.GencodeFuncotationFactory.createGencodeFuncotationOnSingleTranscript(GencodeFuncotationFactory.java:983) at org.broadinstitute.hellbender.tools.funcotator.dataSources.gencode.GencodeFuncotationFactory.createFuncotationsHelper(GencodeFuncotationFactory.java:805) at org.broadinstitute.hellbender.tools.funcotator.dataSources.gencode.GencodeFuncotationFactory.createFuncotationsHelper(GencodeFuncotationFactory.java:789)
the deletion that is causing the error is 141 base pairs, and I noticed the length of the contig Funcotator is trying to retrieve (895) is equal to the UTR length + deletion length + 1, 753 + 141 + 1. When I looked at the source code around where the error occurs, I see where the length of the retrieved interval is defined (line 738):
final SimpleInterval transcriptInterval = new SimpleInterval( transcriptMapIdAndMetadata.mapKey, transcriptMapIdAndMetadata.fivePrimeUtrStart, transcriptMapIdAndMetadata.fivePrimeUtrEnd + extraBases );
and the logic for how large that extraBases should be (line 1566):
final int numExtraTrailingBases = variant.getReference().length() < defaultNumTrail ingBasesForUtrAnnotationSequenceConstruction ? defaultNumTrailingBasesForUtrAnnotationSequenceConst ruction : variant.getReference().length() + 1;
I believe line 1566 is the source of the problem; there is no check that UTR-end + deletion length extends past the end of the transcript.
Steps to reproduce
download funcotator_dataSources.v1.6.20190124s from Broad FTP server
run funcotator using:
Funcotator -R /tmp/GRCh38.fa -V broken.vcf -O broken.out.vcf --data-sources-path funcotator_dataSources.v1.6.20190124s/ --output-file-format VCF --ref-version hg38
on a vcf with a single variant:
chr17 7241460 . ACTGCAAAAGATACAAGATGCAAGAAAGTCACAGAGGTCAAAAATGCCCTCAAAAGAACAGCTGCTAGGTGGAGCCTCCTCCCGCAGAGACTGCACTCCCACCCACAGGAAGCAAGCCTGAGTCTTGGATCAGGTTCCCAC A .
Expected behavior
Funcotator should not attempt to retrieve a sequence that extends past the end of a transcript
Actual behavior
Funcotator crashes because it attempts to retrieve sequence past the end of a transcript
Thanks for the report!
This is either the same as or related to issue #4307 - so it's on my/our radar.
Alright, glad to hear. Do you take pull requests for this sort of thing? If you have a better idea of how/where the code needs to be changed, i would be happy to take a whack at fixing it.
@shenkers Yes, the GATK team is always willing to review PRs from external contributors!
@davidbenjamin @jonn-smith , I created a pull request that addresses this issue #6546, I'm happy to make any changes, if you have comments/suggestions.
Hi there,
I got a very similar error on the GABARAP gene.
GATK version 4.3.0.0. I am running Funcotator on a vcf from illumina dragen.
htsjdk.samtools.SAMException: Query asks for data past end of contig. Query contig ENST00000571253.1|ENSG00000170296.10|OTTHUMG00000102156.4|OTTHUMT00000440082.2|GABARAP-204|GABARAP|837|UTR5:1-753|CDS:754-837| start:1 stop:854 contigLength:837
at org.broadinstitute.hellbender.utils.fasta.CachingIndexedFastaSequenceFile.getSubsequenceAt(CachingIndexedFastaSequenceFile.java:330)
at org.broadinstitute.hellbender.engine.ReferenceFileSource.queryAndPrefetch(ReferenceFileSource.java:78)
at org.broadinstitute.hellbender.engine.ReferenceDataSource.queryAndPrefetch(ReferenceDataSource.java:64)
at org.broadinstitute.hellbender.tools.funcotator.dataSources.gencode.GencodeFuncotationFactory.getFivePrimeUtrSequenceFromTranscriptFasta(GencodeFuncotationFactory.java:778)
at org.broadinstitute.hellbender.tools.funcotator.dataSources.gencode.GencodeFuncotationFactory.createUtrFuncotation(GencodeFuncotationFactory.java:1619)
at org.broadinstitute.hellbender.tools.funcotator.dataSources.gencode.GencodeFuncotationFactory.createGencodeFuncotationOnSingleTranscript(GencodeFuncotationFactory.java:1025)
at org.broadinstitute.hellbender.tools.funcotator.dataSources.gencode.GencodeFuncotationFactory.createFuncotationsHelper(GencodeFuncotationFactory.java:847)
at org.broadinstitute.hellbender.tools.funcotator.dataSources.gencode.GencodeFuncotationFactory.createFuncotationsHelper(GencodeFuncotationFactory.java:831)
at org.broadinstitute.hellbender.tools.funcotator.dataSources.gencode.GencodeFuncotationFactory.lambda$createGencodeFuncotationsByAllTranscripts$0(GencodeFuncotationFactory.java:508)
at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:195)
at java.base/java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1655)
at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484)
at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474)
at java.base/java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:913)
at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at java.base/java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:578)
at org.broadinstitute.hellbender.tools.funcotator.dataSources.gencode.GencodeFuncotationFactory.createGencodeFuncotationsByAllTranscripts(GencodeFuncotationFactory.java:509)
at org.broadinstitute.hellbender.tools.funcotator.dataSources.gencode.GencodeFuncotationFactory.createFuncotationsOnVariant(GencodeFuncotationFactory.java:564)
at org.broadinstitute.hellbender.tools.funcotator.DataSourceFuncotationFactory.determineFuncotations(DataSourceFuncotationFactory.java:243)
at org.broadinstitute.hellbender.tools.funcotator.DataSourceFuncotationFactory.createFuncotations(DataSourceFuncotationFactory.java:211)
at org.broadinstitute.hellbender.tools.funcotator.DataSourceFuncotationFactory.createFuncotations(DataSourceFuncotationFactory.java:182)
at org.broadinstitute.hellbender.tools.funcotator.FuncotatorEngine.lambda$createFuncotationMapForVariant$0(FuncotatorEngine.java:152)
at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:195)
at java.base/java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:177)
at java.base/java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1655)
at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484)
at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474)
at java.base/java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:913)
at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at java.base/java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:578)
at org.broadinstitute.hellbender.tools.funcotator.FuncotatorEngine.createFuncotationMapForVariant(FuncotatorEngine.java:162)
at org.broadinstitute.hellbender.tools.funcotator.Funcotator.enqueueAndHandleVariant(Funcotator.java:924)
at org.broadinstitute.hellbender.tools.funcotator.Funcotator.apply(Funcotator.java:878)
at org.broadinstitute.hellbender.engine.VariantWalker.lambda$traverse$0(VariantWalker.java:104)
at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)
at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:195)
at java.base/java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:177)
at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:195)
at java.base/java.util.Iterator.forEachRemaining(Iterator.java:133)
at java.base/java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484)
at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474)
at java.base/java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150)
at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173)
at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at java.base/java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:497)
at org.broadinstitute.hellbender.engine.VariantWalker.traverse(VariantWalker.java:102)
at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:1095)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:140)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:192)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:211)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
at org.broadinstitute.hellbender.Main.main(Main.java:289)