Set BWA and bwamem2 index memory dynamically
Kept having bwamem2 index tasks that ran forever and failed. Updated bwamem2 to use 28.B of memory per byte of fasta. Issue for reference: https://github.com/bwa-mem2/bwa-mem2/issues/9
Also tracked down the required memory for bwa index while I was at it. Doesn't seem to fail because most of the genome requirements are under the necessary memory.
Not the first place where people have run into this https://github.com/nf-core/modules/pull/6628
I like it 👍 it’ll also play nice with the new resourceLimits directive
I was talking to @drpatelh about this earlier this week. Sounds good. Very neat if it scales in such a linear way.
Should we add a baseline of additional memory?
Closes https://github.com/nf-core/sarek/issues/1377
What can we do for bwa mem and bwamem2 mem?
What can we do for bwa mem and bwamem2 mem?
What do you mean?
Is it right to have these settings hardcoded in the module ? How does it interact with pipeline-level config file doing
withName BWAMEM2_INDEX {
memory { ... }
}
which one takes precedence ?
AFAIK the pipeline config takes precedence over the config hardcoded in the module.
If you're worried about requesting too much resources, the resourceLimit directive should take care of that nicely
I was more worried that 28 GB / Gbp is still too high in my view. I use 24 GB / Gbp in my pipelines and wouldn't want nf-core to force me to waste RAM ;)
Also, your memory definition doesn't consider task.attempt. Are you absolutely certain that 28 GB / Gbp will work for every genome ? Usually, nf-core resource definitions always factor task.attempt.
I wasn't worried of check_max missing since nf-core is about to mandate a recent Nextflow that supports resourceLimit.
FYI, I've just checked our LSF logs and there's been zero memory failures over the 1,698 BWAMEM2_INDEX processes that we ran in 2024 with 24 GB/Gbp.
The memory efficiency is ~76% (median), and goes up to 95%, meaning that 23 GB/Gbp might still work for all genomes (it's just at the limit), but 22 GB/Gbp for sure would yield some memory errors.
Regardless of the scaling factor you use, I'd still keep task.attempt just in case (I'm overcautious !).
Regardless of the scaling factor you use, I'd still keep task.attempt just in case (I'm overcautious !).
I think that's a great point, definately we should definately add that in these.
Any opinions on the scaling factor? Should we really double it everytime or would 1.5x be generous enough?
FYI, I've just checked our LSF logs and there's been zero memory failures over the 1,698 BWAMEM2_INDEX processes that we ran in 2024 with 24 GB/Gbp.
Power to you! 😆 I vote we go with what was mentioned in the bwamem2 issue that they're expecting it to use. Unless you have a better link to point at when people start asking why their jobs keep failing.
We can revert this if it breaks stuff for some reason. Let's merge it and see if anyone has issues since it's gone stale except for @muffato's comments, which I addressed.
I've noticed a small issue with this when running on a small genome with docker:
docker: Error response from daemon: Minimum memory limit allowed is 6MB.
But nothing that can't be fixed with a label selector
We could also just fix it in the module. Make sure it's not equal to 0 or less than 6.
We could take the max between the minimal requirement and the memory we compute there