spades
spades copied to clipboard
Determining optimal -m Memory and -t Threads setting
While trying to run a mammalian (chromosome-specific) assembly with ~20 million reads, stages with k=21 and k=33 complete successfully. Subsequently, the k=55 step fails with
<jemalloc>: Error in malloc(): out of memory. Requested: 64, active: 210579226624
I've tried playing with the memory parameters, setting -m to 120, 256, 500 etc. but this continues to throw an error without completion.
Could you provide any ideas on how to determine optimal -m ? The node I'm using has 64 processors and max of 512 GB memory.
Would reducing the number of threads help ? How does SPAdes decide the total RAM requested ? Can I try any other parameters to get this assembly to complete? I'm open to any suggestions.
Parameters :
System information: SPAdes version: 3.6.1 Threads: 64 Memory limit (in Gb): 250 Python version: 2.7.13 OS: Linux-3.2.0-4-amd64-x86_64-with-debian-7.5 k: [21, 33, 55] Mismatch careful mode is turned OFF Repeat resolution is enabled MismatchCorrector will be SKIPPED Coverage cutoff is turned OFF
The answer is easy - SPAdes does not decide anything. The amount of RAM required depends on your input dataset, coverage, error rate, repeat content, etc. In general we cannot determine in advance how much RAM is required in the majority of cases.
The -m option acts as a last resort precaution since it sets the hard memory limit, SPAdes will simply crash if it would require to allocate more RAM then this limit. In some cases it is possible to provide time / memory tradeoff and we use the value passed to -m option for this. However, in general, you'd simply pass to -m the amount of free RAM available to your SPAdes job. Same for -t option - pass the number of threads that you could use for the assembly.
The message you mentioned clearly contains an error - SPAdes had 196 Gb of RAM allocated and your OS failed to fulfil its request to allocate 64 bytes more. So, just make sure you'd indeed having 512 Gb of free RAM.
I'm having this problem too on our compute server with 1 TB of RAM. Here's the Spades error:
<jemalloc>: Error in malloc(): out of memory. Requested: 365790526992, active: 127263571968
Here's the free memory on the server:
total used free shared buff/cache available
Mem: 1.0T 7.1G 991G 21M 9.2G 998G
Swap: 4.0G 0B 4.0G
System info:
System information:
SPAdes version: 3.10.1
Python version: 3.4.5
OS: Linux-3.10.0-514.10.2.el7.x86_64-x86_64-with-redhat-7.3-Maipo
Tom, the description of the problem was given above. At the time of error SPAdes had ~ 120 Gb of RAM allocated, however, your OS failed to fulfill SPAdes' request to allocate some 350 Gb more.
Thanks for the answer! Any idea why that would happen when there is more than 800 Gb free?
There might be multiple issues starting from virtual memory fragmentation and ending with memory overcommit by other apps. You may want to ask your system administrator how the particular server is configured.
Just to reiterate, reducing the number of threads won't reduce the amount of memory that is required correct?
Depends on the stage of the process. Some stages require fixed amount of RAM per thread (however, we try to estimate the amount of free memory depending on the specified memory limit and amount of memory consumed and use this estimate to limit the memory allocation in such case), some stages certainly process all data and the memory consumption is not thread-dependent.