snakemake-wrappers
snakemake-wrappers copied to clipboard
gatk/haplotypecaller requests huge java heap memory when run on SGE
Snakemake version snakemake: 6.13.0 wrapper: "v0.86.0/bio/gatk/haplotypecaller"
Describe the bug When running gatk/haplotypecaller wrapper on an SGE cluster, a huge amount of java heap memory is requested.
rule call_variants:
input:
bam=get_sample_bams,
ref="resources/genome.fasta",
known="resources/variation.vcf.gz",
output:
gvcf="results/called/{sample}.{contig}.g.vcf.gz",
log:
"logs/gatk/haplotypecaller/{sample}.{contig}.log",
params:
java_opts="",
wrapper:
"v0.86.0/bio/gatk/haplotypecaller"
When run with a 16GB bgzip compressed resources/variation.vcf.gz
file and ~5GB BAM file, a heap size of "-Xmx47729M" is requested, leading to job failure because the heap size cannot be allocated.
When run without the large variation.vcf.gz file and using the same ~5GB BAM file a more reasonable but excessive "-Xmx16538M" is requested.
It therefore seems that this wrapper infers the required java heap size from all the input files while it perhaps might be appropriate to infer the memory requirement from the ref genome file only(?).
I see the same issue using other GATK wrappers, such as gatk/genotypegvcfs.
Note that running without using the SGE (i.e. snakemake --use-conda
) results in no heap size being requested.
Minimal example
rule call_variants:
input:
bam=get_sample_bams,
ref="resources/genome.fasta",
known="resources/variation.vcf.gz",
output:
gvcf="results/called/{sample}.{contig}.g.vcf.gz",
log:
"logs/gatk/haplotypecaller/{sample}.{contig}.log",
params:
java_opts="",
wrapper:
"v0.86.0/bio/gatk/haplotypecaller"
Commandline:
snakemake --profile cluster-qsub --use-conda --cluster-config cluster_config.yaml
Cluster-qsub profile: https://github.com/jaicher/snakemake-qsub
Have you tried specifying the resources in the rule? According to snakemake
documentation:
If --default-resources are not specified, Snakemake uses 'mem_mb=max(2*input.size_mb, 1000)', 'disk_mb=max(2*input.size_mb, 1000)', and 'tmpdir=system_tmpdir'
This issue was marked as stale because it has been open for 6 months with no activity.
This issue was closed because it has been inactive for 1 month since being marked as stale. Feel free to re-open it if you have any further comments.