gatk icon indicating copy to clipboard operation
gatk copied to clipboard

Reference isn't being parsed/detected correctly when using CalibrateDragstrModel in parallel mode

Open nvnieuwk opened this issue 2 years ago • 8 comments
trafficstars

Instructions

The github issue tracker is for bug reports, feature requests, and API documentation requests. General questions about how to use the GATK, how to interpret the output, etc. should be asked on the official support forum.

  • Search the existing github issues to see if your issue (or something similar) has already been reported. If the issue already exists, you may comment there to inquire about the progress.
  • Determine whether your issue is a bug report, a feature request, or a documentation request (for tool/class javadoc only -- for forum docs please post there)
  • Consider if your "issue" is better addressed on the GATK forum: http://gatkforums.broadinstitute.org/gatk Post there if you have questions about expected tool behavior, output format, unexpected results, or generally any question that does not fit into the categories above
  • Use a concise yet descriptive title
  • Choose the corresponding template block below and fill it in, replacing or deleting text in italics (surrounded by _) as appropriate
  • Delete the other template blocks and this header.

Bug Report

Affected tool(s) or class(es)

GATK CalibrateDragstrModel

Affected version(s)

  • [x] Latest public release version [4.3.0.0]
  • [ ] Latest master branch as of [date of test?]

Description

When running CalibrateDragstrModel in parallel mode, the supplied reference isn't detected correctly causing the following error stack trace:

Using GATK jar /usr/local/share/gatk4-4.3.0.0-0/gatk-package-4.3.0.0-local.jar
Running:
    java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xmx72g -jar /usr/local/share/gatk4-4.3.0.0-0/gatk-package-4.3.0.0-local.jar CalibrateDragstrModel --input input.cram --output input.txt --reference hg38.fa --str-table-path hg38.zip --threads 12 --intervals fasta_bed.bed --tmp-dir .
10:24:21.117 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/usr/local/share/gatk4-4.3.0.0-0/gatk-package-4.3.0.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
10:24:21.289 INFO  CalibrateDragstrModel - ------------------------------------------------------------
10:24:21.289 INFO  CalibrateDragstrModel - The Genome Analysis Toolkit (GATK) v4.3.0.0
10:24:21.289 INFO  CalibrateDragstrModel - For support and documentation go to https://software.broadinstitute.org/gatk/
10:24:21.289 INFO  CalibrateDragstrModel - Executing as nvnieuwk on Linux v4.18.0-372.36.1.el8_6.x86_64 amd64
10:24:21.289 INFO  CalibrateDragstrModel - Java runtime: OpenJDK 64-Bit Server VM v11.0.15-internal+0-adhoc..src
10:24:21.289 INFO  CalibrateDragstrModel - Start Date/Time: January 2, 2023 at 10:24:21 AM GMT
10:24:21.289 INFO  CalibrateDragstrModel - ------------------------------------------------------------
10:24:21.289 INFO  CalibrateDragstrModel - ------------------------------------------------------------
10:24:21.290 INFO  CalibrateDragstrModel - HTSJDK Version: 3.0.1
10:24:21.290 INFO  CalibrateDragstrModel - Picard Version: 2.27.5
10:24:21.290 INFO  CalibrateDragstrModel - Built for Spark Version: 2.4.5
10:24:21.290 INFO  CalibrateDragstrModel - HTSJDK Defaults.COMPRESSION_LEVEL : 2
10:24:21.290 INFO  CalibrateDragstrModel - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
10:24:21.290 INFO  CalibrateDragstrModel - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
10:24:21.290 INFO  CalibrateDragstrModel - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
10:24:21.290 INFO  CalibrateDragstrModel - Deflater: IntelDeflater
10:24:21.290 INFO  CalibrateDragstrModel - Inflater: IntelInflater
10:24:21.291 INFO  CalibrateDragstrModel - GCS max retries/reopens: 20
10:24:21.291 INFO  CalibrateDragstrModel - Requester pays: disabled
10:24:21.291 INFO  CalibrateDragstrModel - Initializing engine
10:24:21.937 INFO  FeatureManager - Using codec BEDCodec to read file file:///kyukon/scratch/gent/vo/000/gvo00082/vsc44804/nxf.4BNO5qL7DM/fasta_bed.bed
10:24:21.962 INFO  IntervalArgumentCollection - Processing 3217346917 bp from intervals
10:24:22.008 INFO  CalibrateDragstrModel - Running in parallel using the requested number of threads: 12
10:24:22.008 INFO  CalibrateDragstrModel - Done initializing engine
10:24:22.008 INFO  ProgressMeter - Starting traversal
10:24:22.008 INFO  ProgressMeter -        Current Locus  Elapsed Minutes     Records Processed   Records/Minute
10:24:32.859 INFO  ProgressMeter -        chr1:26000000              0.2                 59038         326477.4
10:24:42.867 INFO  ProgressMeter -        chr1:83000000              0.3                184245         529998.1
10:24:52.965 INFO  ProgressMeter -       chr1:137193529              0.5                306766         594565.4
10:25:03.307 INFO  ProgressMeter -       chr1:193193529              0.7                428759         622924.6
10:25:13.318 INFO  ProgressMeter -         chr2:3237107              0.9                564835         660497.0
10:25:23.358 INFO  ProgressMeter -        chr2:57237107              1.0                681209         666219.1
10:25:33.392 INFO  ProgressMeter -       chr2:109237107              1.2                799610         672091.8
10:25:44.527 INFO  ProgressMeter -       chr2:177512416              1.4                930822         676805.6
10:25:54.821 INFO  ProgressMeter -       chr2:237512416              1.5               1069999         691712.8
10:26:04.863 INFO  ProgressMeter -        chr3:54999378              1.7               1197525         698570.8
10:26:09.642 INFO  CalibrateDragstrModel - Shutting down engine
[January 2, 2023 at 10:26:09 AM GMT] org.broadinstitute.hellbender.tools.dragstr.CalibrateDragstrModel done. Elapsed time: 1.81 minutes.
Runtime.totalMemory()=47647293440
java.lang.IllegalArgumentException: java.lang.IllegalArgumentException: java.lang.IllegalArgumentException: Requested start 8613 is beyond the sequence length HLA-DRB1*04:03:01
    at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
    at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:490)
    at java.base/java.util.concurrent.ForkJoinTask.getThrowableException(ForkJoinTask.java:600)
    at java.base/java.util.concurrent.ForkJoinTask.get(ForkJoinTask.java:1006)
    at org.broadinstitute.hellbender.utils.Utils.runInParallel(Utils.java:1479)
    at org.broadinstitute.hellbender.tools.dragstr.CalibrateDragstrModel.collectCaseStatsParallel(CalibrateDragstrModel.java:551)
    at org.broadinstitute.hellbender.tools.dragstr.CalibrateDragstrModel.traverse(CalibrateDragstrModel.java:202)
    at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:1095)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:140)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:192)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:211)
    at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
    at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
    at org.broadinstitute.hellbender.Main.main(Main.java:289)
Caused by: java.lang.IllegalArgumentException: java.lang.IllegalArgumentException: Requested start 8613 is beyond the sequence length HLA-DRB1*04:03:01
    at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
    at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:490)
    at java.base/java.util.concurrent.ForkJoinTask.getThrowableException(ForkJoinTask.java:600)
    at java.base/java.util.concurrent.ForkJoinTask.reportException(ForkJoinTask.java:678)
    at java.base/java.util.concurrent.ForkJoinTask.invoke(ForkJoinTask.java:737)
    at java.base/java.util.stream.ReduceOps$ReduceOp.evaluateParallel(ReduceOps.java:919)
    at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:233)
    at java.base/java.util.stream.ReferencePipeline.reduce(ReferencePipeline.java:558)
    at org.broadinstitute.hellbender.tools.dragstr.CalibrateDragstrModel.lambda$collectCaseStatsParallel$14(CalibrateDragstrModel.java:568)
    at java.base/java.util.concurrent.ForkJoinTask$AdaptedCallable.exec(ForkJoinTask.java:1448)
    at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:290)
    at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1020)
    at java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1656)
    at java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1594)
    at java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:183)
Caused by: java.lang.IllegalArgumentException: Requested start 8613 is beyond the sequence length HLA-DRB1*04:03:01
    at htsjdk.samtools.cram.ref.ReferenceSource.getReferenceBasesByRegion(ReferenceSource.java:207)
    at htsjdk.samtools.cram.build.CRAMReferenceRegion.fetchReferenceBasesByRegion(CRAMReferenceRegion.java:169)
    at htsjdk.samtools.cram.structure.Slice.normalizeCRAMRecords(Slice.java:502)
    at htsjdk.samtools.cram.structure.Container.getSAMRecords(Container.java:322)
    at htsjdk.samtools.CRAMIterator.nextContainer(CRAMIterator.java:112)
    at htsjdk.samtools.CRAMIterator.hasNext(CRAMIterator.java:204)
    at htsjdk.samtools.CRAMFileReader$CRAMIntervalIteratorBase.getNextRecord(CRAMFileReader.java:589)
    at htsjdk.samtools.CRAMFileReader$CRAMIntervalIteratorBase.initializeIterator(CRAMFileReader.java:562)
    at htsjdk.samtools.CRAMFileReader$CRAMIntervalIterator.<init>(CRAMFileReader.java:620)
    at htsjdk.samtools.CRAMFileReader$CRAMIntervalIterator.<init>(CRAMFileReader.java:615)
    at htsjdk.samtools.CRAMFileReader.query(CRAMFileReader.java:487)
    at htsjdk.samtools.SamReader$PrimitiveSamReaderToSamReaderAdapter.query(SamReader.java:550)
    at htsjdk.samtools.SamReader$PrimitiveSamReaderToSamReaderAdapter.queryOverlapping(SamReader.java:417)
    at org.broadinstitute.hellbender.utils.iterators.SamReaderQueryingIterator.loadNextIterator(SamReaderQueryingIterator.java:130)
    at org.broadinstitute.hellbender.utils.iterators.SamReaderQueryingIterator.<init>(SamReaderQueryingIterator.java:69)
    at org.broadinstitute.hellbender.engine.ReadsPathDataSource.prepareIteratorsForTraversal(ReadsPathDataSource.java:412)
    at org.broadinstitute.hellbender.engine.ReadsPathDataSource.prepareIteratorsForTraversal(ReadsPathDataSource.java:389)
    at org.broadinstitute.hellbender.engine.ReadsPathDataSource.query(ReadsPathDataSource.java:352)
    at org.broadinstitute.hellbender.tools.dragstr.CalibrateDragstrModel.readStream(CalibrateDragstrModel.java:915)
    at org.broadinstitute.hellbender.tools.dragstr.CalibrateDragstrModel.lambda$null$11(CalibrateDragstrModel.java:556)
    at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:195)
    at org.broadinstitute.hellbender.tools.dragstr.InterleavingListSpliterator.forEachRemaining(InterleavingListSpliterator.java:87)
    at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484)
    at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474)
    at java.base/java.util.stream.ReduceOps$ReduceTask.doLeaf(ReduceOps.java:952)
    at java.base/java.util.stream.ReduceOps$ReduceTask.doLeaf(ReduceOps.java:926)
    at java.base/java.util.stream.AbstractTask.compute(AbstractTask.java:327)
    at java.base/java.util.concurrent.CountedCompleter.exec(CountedCompleter.java:746)
    ... 5 more

However it does work when running the tool single threaded with the exact same options.

Steps to reproduce

I've sadly been unable to create a reproducible example. I've only encountered this with non-public data which I can't share here. I'd be happy to run tests for you though.

nvnieuwk avatar Jan 03 '23 13:01 nvnieuwk