spades icon indicating copy to clipboard operation
spades copied to clipboard

Determining optimal -m Memory and -t Threads setting

Open md5sam opened this issue 7 years ago • 7 comments

While trying to run a mammalian (chromosome-specific) assembly with ~20 million reads, stages with k=21 and k=33 complete successfully. Subsequently, the k=55 step fails with

<jemalloc>: Error in malloc(): out of memory. Requested: 64, active: 210579226624

I've tried playing with the memory parameters, setting -m to 120, 256, 500 etc. but this continues to throw an error without completion.

Could you provide any ideas on how to determine optimal -m ? The node I'm using has 64 processors and max of 512 GB memory.

Would reducing the number of threads help ? How does SPAdes decide the total RAM requested ? Can I try any other parameters to get this assembly to complete? I'm open to any suggestions.

Parameters :

System information: SPAdes version: 3.6.1 Threads: 64 Memory limit (in Gb): 250 Python version: 2.7.13 OS: Linux-3.2.0-4-amd64-x86_64-with-debian-7.5 k: [21, 33, 55] Mismatch careful mode is turned OFF Repeat resolution is enabled MismatchCorrector will be SKIPPED Coverage cutoff is turned OFF

md5sam avatar May 09 '17 18:05 md5sam

The answer is easy - SPAdes does not decide anything. The amount of RAM required depends on your input dataset, coverage, error rate, repeat content, etc. In general we cannot determine in advance how much RAM is required in the majority of cases.

The -m option acts as a last resort precaution since it sets the hard memory limit, SPAdes will simply crash if it would require to allocate more RAM then this limit. In some cases it is possible to provide time / memory tradeoff and we use the value passed to -m option for this. However, in general, you'd simply pass to -m the amount of free RAM available to your SPAdes job. Same for -t option - pass the number of threads that you could use for the assembly.

The message you mentioned clearly contains an error - SPAdes had 196 Gb of RAM allocated and your OS failed to fulfil its request to allocate 64 bytes more. So, just make sure you'd indeed having 512 Gb of free RAM.

asl avatar May 09 '17 21:05 asl

I'm having this problem too on our compute server with 1 TB of RAM. Here's the Spades error:

<jemalloc>: Error in malloc(): out of memory. Requested: 365790526992, active: 127263571968

Here's the free memory on the server:

              total        used        free      shared  buff/cache   available
Mem:           1.0T        7.1G        991G         21M        9.2G        998G
Swap:          4.0G          0B        4.0G

System info:

System information:
  SPAdes version: 3.10.1
  Python version: 3.4.5
  OS: Linux-3.10.0-514.10.2.el7.x86_64-x86_64-with-redhat-7.3-Maipo

TomHarrop avatar Aug 14 '17 20:08 TomHarrop

Tom, the description of the problem was given above. At the time of error SPAdes had ~ 120 Gb of RAM allocated, however, your OS failed to fulfill SPAdes' request to allocate some 350 Gb more.

asl avatar Aug 14 '17 21:08 asl

Thanks for the answer! Any idea why that would happen when there is more than 800 Gb free?

TomHarrop avatar Aug 14 '17 21:08 TomHarrop

There might be multiple issues starting from virtual memory fragmentation and ending with memory overcommit by other apps. You may want to ask your system administrator how the particular server is configured.

asl avatar Aug 14 '17 22:08 asl

Just to reiterate, reducing the number of threads won't reduce the amount of memory that is required correct?

fanhuan avatar May 21 '20 08:05 fanhuan

Depends on the stage of the process. Some stages require fixed amount of RAM per thread (however, we try to estimate the amount of free memory depending on the specified memory limit and amount of memory consumed and use this estimate to limit the memory allocation in such case), some stages certainly process all data and the memory consumption is not thread-dependent.

asl avatar May 21 '20 09:05 asl