modules Improve module specific resource requests

Meta issue from #6628 for more discussion and dropping some resources.

We might need to do some benchmarking of the biggest bottleneck modules.

Sep 11 '24 21:09 edmundmiller

Huge samtools sort analysis
Samtools sort can use more memory than you give it https://github.com/snakemake/snakemake-wrappers/pull/2648/files
efficiency per CPU task

Sep 11 '24 21:09 edmundmiller

Resource limits will probably help for example bismark align

Sep 11 '24 21:09 edmundmiller

Hmmm, I implemented the samtools sort memory setting in my in-house pipeline last week...

Nov 19 '24 15:11 SPPearce

Two different ways we can handle these:

Setting the memory dynamically for algorithm things. We can take a filesize/number of files/runtime requirement and calculate pretty close to the CPUs/memory. https://github.com/nf-core/modules/pull/6628
Point of diminishing returns with resourceLimits(users can take them off if they really want to). https://github.com/nf-core/modules/pull/7173

Dec 06 '24 17:12 edmundmiller

Original post from @MatthiasZepper

Hello everybody, Above I have posted a small poll regarding the process labels that are added to modules to specify the required resources. Typically, Nextflow processes in nf-core pipelines to not really optimised regarding the resources they request (owing to the problem that people process very different sample sizes with the pipelines and e.g. the runtime or the memory is hardly predictable depending on the reference genome). For the umi-tools modules, I for example put process_single instead of process_medium when bumping the module from version 1.12 to 1.14, because the tool is not multi-threaded anyway process_medium requests 8 CPUs. However, this bites me now in the RNA-seq pipeline, because the umi-tools extract has very long runtimes and umi-tools dedup needs excessive amounts of memory and the process_single definition in the base.config has too little of both. Changing this would however again affect other really tiny modules using the same label. Therefore, I was thinking of having atomic process labels for each category separately, that would work in conjunction. It is the same philosophy that Tailwind uses for CSS, e.g to have memory_1GB, memory_5GB, memory 10GB labels (or alternatively memory_tiny, memory_small, memory_huge etc.) and the same for CPUs and Runtimes.

Comment from @drpatelh: https://nfcore.slack.com/archives/CJRH30T6V/p1680092402658969?thread_ts=1680090424.544209&cid=CJRH30T6V

Mar 04 '25 19:03 edmundmiller