gatk icon indicating copy to clipboard operation
gatk copied to clipboard

Memory issues when running BaseRecalibrator

Open MetteBoge opened this issue 11 months ago • 0 comments

Hi, I need to run BaseRecalibrator as a part of the preprocessing of my RNAseq bam files before variant calling. But I experience difficulties with memory! Here is the error I get:

  22:30:25.477 INFO  BaseRecalibrator - Start Date/Time: March 8, 2024 at 10:30:25 PM GMT
  22:30:25.477 INFO  BaseRecalibrator - ------------------------------------------------------------
  22:30:25.477 INFO  BaseRecalibrator - ------------------------------------------------------------
  22:30:25.477 INFO  BaseRecalibrator - HTSJDK Version: 4.1.0
  22:30:25.478 INFO  BaseRecalibrator - Picard Version: 3.1.1
  22:30:25.478 INFO  BaseRecalibrator - Built for Spark Version: 3.5.0
  22:30:25.478 INFO  BaseRecalibrator - HTSJDK Defaults.COMPRESSION_LEVEL : 2
  22:30:25.478 INFO  BaseRecalibrator - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
  22:30:25.478 INFO  BaseRecalibrator - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
  22:30:25.478 INFO  BaseRecalibrator - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
  22:30:25.478 INFO  BaseRecalibrator - Deflater: IntelDeflater
  22:30:25.478 INFO  BaseRecalibrator - Inflater: IntelInflater
  22:30:25.478 INFO  BaseRecalibrator - GCS max retries/reopens: 20
  22:30:25.479 INFO  BaseRecalibrator - Requester pays: disabled
  22:30:25.479 INFO  BaseRecalibrator - Initializing engine
  WARNING       2024-03-08 22:30:25     SamFiles        The index file /mnt/storage/users/dockworker/mpedersen/work/RNAseq_variant_call/work/d6/362957b6215ad2e8193c27c895d42d/VR0024SA.withoutERCCs.withRG.markedDup.splitNcigar.bai was found by resolving the canonical path of a symlink: VR0024SA.withoutERCCs.withRG.markedDup.splitNcigar.bam -> /mnt/storage/users/dockworker/mpedersen/work/RNAseq_variant_call/work/d6/362957b6215ad2e8193c27c895d42d/VR0024SA.withoutERCCs.withRG.markedDup.splitNcigar.bam
  22:30:25.631 INFO  FeatureManager - Using codec VCFCodec to read file file://1000G_phase1.snps.high_confidence.hg38.vcf.gz
  22:30:25.754 INFO  FeatureManager - Using codec VCFCodec to read file file://Mills_and_1000G_gold_standard.indels.hg38.vcf.gz
  23:39:21.541 INFO  BaseRecalibrator - Shutting down engine
  [March 8, 2024 at 11:39:21 PM GMT] org.broadinstitute.hellbender.tools.walkers.bqsr.BaseRecalibrator done. Elapsed time: 68.94 minutes.
  Runtime.totalMemory()=214748364800
  java.lang.OutOfMemoryError: Java heap space
        at htsjdk.tribble.readers.TabixReader.readInt(TabixReader.java:189)
        at htsjdk.tribble.readers.TabixReader.readIndex(TabixReader.java:274)
        at htsjdk.tribble.readers.TabixReader.readIndex(TabixReader.java:287)
        at htsjdk.tribble.readers.TabixReader.<init>(TabixReader.java:165)
        at htsjdk.tribble.readers.TabixReader.<init>(TabixReader.java:129)
        at htsjdk.tribble.TabixFeatureReader.<init>(TabixFeatureReader.java:80)
        at htsjdk.tribble.AbstractFeatureReader.getFeatureReader(AbstractFeatureReader.java:117)
        at org.broadinstitute.hellbender.engine.FeatureDataSource.getTribbleFeatureReader(FeatureDataSource.java:433)
        at org.broadinstitute.hellbender.engine.FeatureDataSource.getFeatureReader(FeatureDataSource.java:377)
        at org.broadinstitute.hellbender.engine.FeatureDataSource.<init>(FeatureDataSource.java:319)
        at org.broadinstitute.hellbender.engine.FeatureDataSource.<init>(FeatureDataSource.java:291)
        at org.broadinstitute.hellbender.engine.FeatureManager.addToFeatureSources(FeatureManager.java:245)
        at org.broadinstitute.hellbender.engine.FeatureManager.initializeFeatureSources(FeatureManager.java:208)
        at org.broadinstitute.hellbender.engine.FeatureManager.<init>(FeatureManager.java:155)
        at org.broadinstitute.hellbender.engine.ReadWalker.initializeFeatures(ReadWalker.java:72)
        at org.broadinstitute.hellbender.engine.GATKTool.onStartup(GATKTool.java:726)
        at org.broadinstitute.hellbender.engine.ReadWalker.onStartup(ReadWalker.java:51)
        at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:147)
        at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:198)
        at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:217)
        at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:166)
        at org.broadinstitute.hellbender.Main.mainEntry(Main.java:209)
        at org.broadinstitute.hellbender.Main.main(Main.java:306)
  Using GATK jar /gatk/gatk-package-4.5.0.0-local.jar
  Running:
      java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xms200G -Xmx200G -XX:ParallelGCThreads=2 -jar /gatk/gatk-package-4.5.0.0-local.jar BaseRecalibrator -I VR0024SA.withoutERCCs.withRG.markedDup.splitNcigar.bam -O VR0024SA.withoutERCCs.withRG.markedDup.splitNcigar.baseRecal.bam -R GRCh38.primary_assembly.genome.fa --known-sites 1000G_phase1.snps.high_confidence.hg38.vcf.gz --known-sites Mills_and_1000G_gold_standard.indels.hg38.vcf.gz --tmp-dir /tmp --disable-bam-index-caching true

Work dir:
  /mnt/storage/users/dockworker/mpedersen/work/RNAseq_variant_call/work/71/ac26344f0e095f7fe77cbb45a334db

Tip: view the complete command output by changing to the process work dir and entering the command `cat .command.out`

 -- Check '.nextflow.log' file for details

I tried to run it like this:

    gatk --java-options "-Xms200G -Xmx200G -XX:ParallelGCThreads=2" \
    BaseRecalibrator \
    -I $input_bam \
    -O "${file(input_bam).baseName}.baseRecal.bam" \
    -R $reference \
    --known-sites $kg_snp \
    --known-sites $kg_indel \
    --tmp-dir /tmp \
    --disable-bam-index-caching true

but I still get the memory error. I have more memory to use, but it seems very inefficient if I need to go up to 1TB? Why can I not make this run? And is there any alternative when I want to do the MarkDup, SplitCigar, BaseRecal ?

Hope you can help, BR, Mette

MetteBoge avatar Mar 09 '24 00:03 MetteBoge