snakemake-wrappers icon indicating copy to clipboard operation
snakemake-wrappers copied to clipboard

gatk/haplotypecaller requests huge java heap memory when run on SGE

Open david-a-parry opened this issue 3 years ago • 1 comments

Snakemake version snakemake: 6.13.0 wrapper: "v0.86.0/bio/gatk/haplotypecaller"

Describe the bug When running gatk/haplotypecaller wrapper on an SGE cluster, a huge amount of java heap memory is requested.

rule call_variants:
    input:
        bam=get_sample_bams,
        ref="resources/genome.fasta",
        known="resources/variation.vcf.gz",
    output:
        gvcf="results/called/{sample}.{contig}.g.vcf.gz",
    log:
        "logs/gatk/haplotypecaller/{sample}.{contig}.log",
    params:
        java_opts="",
    wrapper:
        "v0.86.0/bio/gatk/haplotypecaller"

When run with a 16GB bgzip compressed resources/variation.vcf.gz file and ~5GB BAM file, a heap size of "-Xmx47729M" is requested, leading to job failure because the heap size cannot be allocated.

When run without the large variation.vcf.gz file and using the same ~5GB BAM file a more reasonable but excessive "-Xmx16538M" is requested.

It therefore seems that this wrapper infers the required java heap size from all the input files while it perhaps might be appropriate to infer the memory requirement from the ref genome file only(?).

I see the same issue using other GATK wrappers, such as gatk/genotypegvcfs.

Note that running without using the SGE (i.e. snakemake --use-conda ) results in no heap size being requested.

Minimal example

rule call_variants:
    input:
        bam=get_sample_bams,
        ref="resources/genome.fasta",
        known="resources/variation.vcf.gz",
    output:
        gvcf="results/called/{sample}.{contig}.g.vcf.gz",
    log:
        "logs/gatk/haplotypecaller/{sample}.{contig}.log",
    params:
        java_opts="",
    wrapper:
        "v0.86.0/bio/gatk/haplotypecaller"

Commandline: snakemake --profile cluster-qsub --use-conda --cluster-config cluster_config.yaml

Cluster-qsub profile: https://github.com/jaicher/snakemake-qsub

david-a-parry avatar Jan 25 '22 12:01 david-a-parry

Have you tried specifying the resources in the rule? According to snakemake documentation:

 If --default-resources are not specified, Snakemake uses 'mem_mb=max(2*input.size_mb, 1000)', 'disk_mb=max(2*input.size_mb, 1000)', and 'tmpdir=system_tmpdir'

fgvieira avatar Mar 21 '22 19:03 fgvieira

This issue was marked as stale because it has been open for 6 months with no activity.

github-actions[bot] avatar Jan 01 '24 01:01 github-actions[bot]

This issue was closed because it has been inactive for 1 month since being marked as stale. Feel free to re-open it if you have any further comments.

github-actions[bot] avatar Feb 01 '24 01:02 github-actions[bot]