How much RAM would be a good fit
Hello,
i wanted to ask how much RAM seems to be good to set for metaSPAdes to run for it?
Also would it make sense to have some extra memory beside of the set limit, like only give it acces to 95% and the rest would be a buffer. If yes what would be a good ratio for it?
Background: In HPC systems we set a hard limit on the allowed memory and apparently metaSPAdes uses more that specified via the command line (we use the hard limit as value for the memory parameter).
So we were wondering if we should subtract a constant amount or a fraction of the hard limit to be on the safe side.
Background: In HPC systems we set a hard limit on the allowed memory and apparently metaSPAdes uses more that specified via the command line (we use the hard limit as value for the memory parameter).
Well, as specified in the manual (https://ablab.github.io/spades/running.html#advanced-options):
-m
(or --memory ) Set memory limit in Gb. SPAdes terminates if it reaches this limit. The default value is 250 Gb. Actual amount of RAM consumed will be below this limit.
However, what is "used memory" is a very vague thing. HPC job schedulers all invented their own ways to monitor / check for memory consumption and there is quite a disagreement here.
To be precise, this SPAdes option sets the upper limit of SPAdes process address space (virtual memory) via setrlimit call (https://linux.die.net/man/2/setrlimit, see RLIMIT_AS). We know that is not quite working on Mac OS, but on Linux it worked reliably. And after the process exhausted its virtual memory it is expected that any memory allocation would fail and / or process will be terminated by OS.
We cannot say if this limit coincides with "memory limit" option on your HPC system. E.g. SLURM (via cgroups) limits the resident set size (RSS) which is always smaller that virtual memory unless swap or some kind of shared memory is involved.
Hi I'm also dealing with this issue. Had to give a very dirty metagenome (wastewater) nearly 1Tb of ram, and that wasn't a massive dataset -- maybe ~400M PE Illumina reads. I add in a long read dataset for a hybrid approach and up the maximum, and now I'm beginning to hit limits on the compute nodes I have access to. Combining multiple datasets, it appears quite computationally expensive (especially as read depth increases). Are there a set of parameter to better manage memory allocation on metaspades besides just setting a max limit, using the biggest compute node and hoping for the best?