gatk
gatk copied to clipboard
Memory issues when running BaseRecalibrator
Hi, I need to run BaseRecalibrator as a part of the preprocessing of my RNAseq bam files before variant calling. But I experience difficulties with memory! Here is the error I get:
22:30:25.477 INFO BaseRecalibrator - Start Date/Time: March 8, 2024 at 10:30:25 PM GMT
22:30:25.477 INFO BaseRecalibrator - ------------------------------------------------------------
22:30:25.477 INFO BaseRecalibrator - ------------------------------------------------------------
22:30:25.477 INFO BaseRecalibrator - HTSJDK Version: 4.1.0
22:30:25.478 INFO BaseRecalibrator - Picard Version: 3.1.1
22:30:25.478 INFO BaseRecalibrator - Built for Spark Version: 3.5.0
22:30:25.478 INFO BaseRecalibrator - HTSJDK Defaults.COMPRESSION_LEVEL : 2
22:30:25.478 INFO BaseRecalibrator - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
22:30:25.478 INFO BaseRecalibrator - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
22:30:25.478 INFO BaseRecalibrator - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
22:30:25.478 INFO BaseRecalibrator - Deflater: IntelDeflater
22:30:25.478 INFO BaseRecalibrator - Inflater: IntelInflater
22:30:25.478 INFO BaseRecalibrator - GCS max retries/reopens: 20
22:30:25.479 INFO BaseRecalibrator - Requester pays: disabled
22:30:25.479 INFO BaseRecalibrator - Initializing engine
WARNING 2024-03-08 22:30:25 SamFiles The index file /mnt/storage/users/dockworker/mpedersen/work/RNAseq_variant_call/work/d6/362957b6215ad2e8193c27c895d42d/VR0024SA.withoutERCCs.withRG.markedDup.splitNcigar.bai was found by resolving the canonical path of a symlink: VR0024SA.withoutERCCs.withRG.markedDup.splitNcigar.bam -> /mnt/storage/users/dockworker/mpedersen/work/RNAseq_variant_call/work/d6/362957b6215ad2e8193c27c895d42d/VR0024SA.withoutERCCs.withRG.markedDup.splitNcigar.bam
22:30:25.631 INFO FeatureManager - Using codec VCFCodec to read file file://1000G_phase1.snps.high_confidence.hg38.vcf.gz
22:30:25.754 INFO FeatureManager - Using codec VCFCodec to read file file://Mills_and_1000G_gold_standard.indels.hg38.vcf.gz
23:39:21.541 INFO BaseRecalibrator - Shutting down engine
[March 8, 2024 at 11:39:21 PM GMT] org.broadinstitute.hellbender.tools.walkers.bqsr.BaseRecalibrator done. Elapsed time: 68.94 minutes.
Runtime.totalMemory()=214748364800
java.lang.OutOfMemoryError: Java heap space
at htsjdk.tribble.readers.TabixReader.readInt(TabixReader.java:189)
at htsjdk.tribble.readers.TabixReader.readIndex(TabixReader.java:274)
at htsjdk.tribble.readers.TabixReader.readIndex(TabixReader.java:287)
at htsjdk.tribble.readers.TabixReader.<init>(TabixReader.java:165)
at htsjdk.tribble.readers.TabixReader.<init>(TabixReader.java:129)
at htsjdk.tribble.TabixFeatureReader.<init>(TabixFeatureReader.java:80)
at htsjdk.tribble.AbstractFeatureReader.getFeatureReader(AbstractFeatureReader.java:117)
at org.broadinstitute.hellbender.engine.FeatureDataSource.getTribbleFeatureReader(FeatureDataSource.java:433)
at org.broadinstitute.hellbender.engine.FeatureDataSource.getFeatureReader(FeatureDataSource.java:377)
at org.broadinstitute.hellbender.engine.FeatureDataSource.<init>(FeatureDataSource.java:319)
at org.broadinstitute.hellbender.engine.FeatureDataSource.<init>(FeatureDataSource.java:291)
at org.broadinstitute.hellbender.engine.FeatureManager.addToFeatureSources(FeatureManager.java:245)
at org.broadinstitute.hellbender.engine.FeatureManager.initializeFeatureSources(FeatureManager.java:208)
at org.broadinstitute.hellbender.engine.FeatureManager.<init>(FeatureManager.java:155)
at org.broadinstitute.hellbender.engine.ReadWalker.initializeFeatures(ReadWalker.java:72)
at org.broadinstitute.hellbender.engine.GATKTool.onStartup(GATKTool.java:726)
at org.broadinstitute.hellbender.engine.ReadWalker.onStartup(ReadWalker.java:51)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:147)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:198)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:217)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:166)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:209)
at org.broadinstitute.hellbender.Main.main(Main.java:306)
Using GATK jar /gatk/gatk-package-4.5.0.0-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xms200G -Xmx200G -XX:ParallelGCThreads=2 -jar /gatk/gatk-package-4.5.0.0-local.jar BaseRecalibrator -I VR0024SA.withoutERCCs.withRG.markedDup.splitNcigar.bam -O VR0024SA.withoutERCCs.withRG.markedDup.splitNcigar.baseRecal.bam -R GRCh38.primary_assembly.genome.fa --known-sites 1000G_phase1.snps.high_confidence.hg38.vcf.gz --known-sites Mills_and_1000G_gold_standard.indels.hg38.vcf.gz --tmp-dir /tmp --disable-bam-index-caching true
Work dir:
/mnt/storage/users/dockworker/mpedersen/work/RNAseq_variant_call/work/71/ac26344f0e095f7fe77cbb45a334db
Tip: view the complete command output by changing to the process work dir and entering the command `cat .command.out`
-- Check '.nextflow.log' file for details
I tried to run it like this:
gatk --java-options "-Xms200G -Xmx200G -XX:ParallelGCThreads=2" \
BaseRecalibrator \
-I $input_bam \
-O "${file(input_bam).baseName}.baseRecal.bam" \
-R $reference \
--known-sites $kg_snp \
--known-sites $kg_indel \
--tmp-dir /tmp \
--disable-bam-index-caching true
but I still get the memory error. I have more memory to use, but it seems very inefficient if I need to go up to 1TB? Why can I not make this run? And is there any alternative when I want to do the MarkDup, SplitCigar, BaseRecal ?
Hope you can help, BR, Mette