gatk
gatk copied to clipboard
java.lang.ArrayIndexOutOfBoundsException when creating tabix index
Bug Report
Affected tool(s) or class(es)
gatk SortVcf
Affected version(s)
Mac OS X 10.16 x86_64; OpenJDK 64-Bit Server VM 1.8.0_322-b06; Deflater: Intel; Inflater: Intel; Provider GCS is available; Picard version: Version:4.1.4.1
Description
SortVcf finishes sorting and writes out a VCF, but then fails with java.lang.ArrayIndexOutOfBoundsException when generating the tabix index. To work around this, I can run with --CREATE_INDEX false and then run tabix to generate the index.
INFO 2022-05-06 12:14:45 SortVcf wrote 675,000 records. Elapsed time: 00:00:03s. Time for last 25,000: 0s. Last read position: chr3:41,521,469
INFO 2022-05-06 12:14:45 SortVcf wrote 700,000 records. Elapsed time: 00:00:03s. Time for last 25,000: 0s. Last read position: chr3:61,833,861
INFO 2022-05-06 12:14:45 SortVcf wrote 725,000 records. Elapsed time: 00:00:03s. Time for last 25,000: 0s. Last read position: chr3:78,534,676
INFO 2022-05-06 12:14:45 SortVcf wrote 750,000 records. Elapsed time: 00:00:03s. Time for last 25,000: 0s. Last read position: chr3:100,707,682
INFO 2022-05-06 12:14:45 SortVcf wrote 775,000 records. Elapsed time: 00:00:03s. Time for last 25,000: 0s. Last read position: chr3:117,527,190
INFO 2022-05-06 12:14:45 SortVcf wrote 800,000 records. Elapsed time: 00:00:03s. Time for last 25,000: 0s. Last read position: chr3:134,613,380
INFO 2022-05-06 12:14:45 SortVcf wrote 825,000 records. Elapsed time: 00:00:03s. Time for last 25,000: 0s. Last read position: chr3:153,780,108
INFO 2022-05-06 12:14:45 SortVcf wrote 850,000 records. Elapsed time: 00:00:03s. Time for last 25,000: 0s. Last read position: chr3:173,329,831
INFO 2022-05-06 12:14:46 SortVcf wrote 875,000 records. Elapsed time: 00:00:03s. Time for last 25,000: 0s. Last read position: chr3:192,133,262
[Fri May 06 12:14:46 EDT 2022] picard.vcf.SortVcf done. Elapsed time: 0.36 minutes.
Runtime.totalMemory()=2855272448
To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
java.lang.ArrayIndexOutOfBoundsException: 16799
at htsjdk.samtools.BinningIndexBuilder.processFeature(BinningIndexBuilder.java:102)
at htsjdk.tribble.index.tabix.TabixIndexCreator.finalizeFeature(TabixIndexCreator.java:106)
at htsjdk.tribble.index.tabix.TabixIndexCreator.addFeature(TabixIndexCreator.java:92)
at htsjdk.variant.variantcontext.writer.IndexingVariantContextWriter.add(IndexingVariantContextWriter.java:203)
at htsjdk.variant.variantcontext.writer.VCFWriter.add(VCFWriter.java:242)
at picard.vcf.SortVcf.writeSortedOutput(SortVcf.java:183)
at picard.vcf.SortVcf.doWork(SortVcf.java:101)
at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:305)
at org.broadinstitute.hellbender.cmdline.PicardCommandLineProgramExecutor.instanceMain(PicardCommandLineProgramExecutor.java:25)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:163)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:206)
at org.broadinstitute.hellbender.Main.main(Main.java:292)
Expected output
There's almost certainly some format issue with my VCF, but ideally GATK would have a better error message than ArrayIndexOutOfBoundsException.
@bw2 I agree, this is an unhelpful error. We should fix it but it probably has to be done in htsjdk. (or picard since this is a picard tool we import).
I'm not 100% sure what the issue is, it seems like were somehow resolving an invalid bin in the index. I would expect that that might happen using a very long chromosome, but 193,00,00 shouldn't be too large. Are you using non-human data or something with an extremely long variant?
Yes, this was human data. It might have been a long variant.
@bw2 Do you have a small file that reproduces this issue? We'll need a runnable test case that reproduces this in order to debug further.
I'm not sure if it'll fix or affect this issue, but I noticed this and want to note that @tedsharpe has an active pull request to fix issues with tabix index generation: https://github.com/broadinstitute/gatk/pull/7858