snakemake-wrappers icon indicating copy to clipboard operation
snakemake-wrappers copied to clipboard

use `resources: mem_gb=3` specs in all Java tools to request `-Xmx{snakemake.resources.mem_gb}G`

Open dlaehnemann opened this issue 4 years ago • 4 comments

Is your feature request related to a problem? Please describe. Specifying this via params strings makes snakemake unaware of these resources.

Describe the solution you'd like Via resources, snakemake can manage available memory in its scheduling.

dlaehnemann avatar Jun 25 '20 14:06 dlaehnemann

I will try to do that. The mem_gb=3, is this just a rough guess? For example, I think SnpEff requires more than 3gb, but it might also depend on the vcf, doesn't it?

christopher-schroeder avatar Jun 26 '20 06:06 christopher-schroeder

Another question: I see the benefits for dealing only with "gb", but it would also be possible to allow other units, like 512m. What if the user wants to set the used amount of ram to something smaller than 1gb? I don't think that 0.5gb is allowed (but I haven't tested this). So without thinking much about it, I would rather use "mem" instead of "mem_gb" and let the user choose the unit by himself.

ps.: Nevermind, snakemake would not be able to handle this. Stupid idea.

christopher-schroeder avatar Jun 26 '20 06:06 christopher-schroeder

I have a suggestion. The metric unit prefixes T, G, M, k, h and da are independent of the unit itself. So it would be great, if snakemake would support them, so that one could write

snakemake -j 1h -mem 512m

which would then be translated internally to 100 and (512 * 1024). (probably noone would use "hecto" (h), but anyway). Tthis would also be useful for other resources. With the internally translation to the base unit - "bytes" in the case of mem - we would avoid any problems with small units.

christopher-schroeder avatar Jun 26 '20 07:06 christopher-schroeder

The 3 was just to a number in there instead of some placeholder, so a different default for different tools makes sense. I think the most important point is getting a unified approach of treating memory resources in all Java tools whose bioconda packages allow specification of -Xmx.

Generally, I would think that on most machines and in most settings, specifying integer values of Gigabytes should be good enough. Memory is usually available and only a limiting factor for some tools, so generous defaults would usually make sense, to avoid job failures due to limited JVM memory resources. But I do see the setting of some tool that runs on a single core but needs a lot of memory, so that when you run lots of those jobs in parallel, it does get limiting. The easiest way out would be to be able to specify float values for --resources mem_gb=0.5, and with a quick search of the snakemake code I couldn't find any requirement for a resource to be int. So this might just work---I guess a quick test whether snakemake complains would be a start?

Also, I do see the appeal of a clean solution allowing metric units for memory and introducing a separate memory resource command-line parameter, but I'm not sure if this is not over-engineering the problem for now. Maybe @johanneskoester can comment on the idea and if we all agree to pursue it, this problem would head over to the snakemake repo, first... :D

dlaehnemann avatar Jun 26 '20 08:06 dlaehnemann