spades metaSPADES out of memory errors on medium size dataset

Description of bug

I am running metaSPADES on a co-assembly of 6 samples totalling 19gb raw reads given to metaSPADES. Despite this being a somewhat typical size of metagenomic dataset I get out of memory errors when giving 170 GB of memory (and 16 threads). I know this is a me problem but I am just unsure how to process as that is a huge amount of memory already and it seems crazy that it OOMs.

Do you have any advice on how I could proceed? metaSPADES provides better assemblies that other tools out there so I would like to use it.

I understand there are other factors but is it possible for you to provide usage data similar to how you do for spades in the README?

spades.log

params.txt

SPAdes version

3.14.0

Operating System

CentOS Linux release 7.6.1810 (Core)

Python Version

No response

Method of SPAdes installation

Manual

No errors reported in spades.log

[ ] Yes

Jun 15 '22 15:06 Rob-murphys

Consider upgrading to the latest SPAdes 3.15.4

Jun 15 '22 16:06 asl

The error still occurs on the latest version of SPAdes

Aug 01 '22 13:08 Rob-murphys

@Lamm-a will you please post new spades.log?

Aug 01 '22 14:08 asl

This is a slightly different version to the above. The forward and reverse reads are 27G each. I have allocated 178G with 16 cores.

SPAdes.log spades.log

SPAdes version 3.15.3 (not the absolute current so I am re running)

Aug 02 '22 07:08 Rob-murphys

Yes, you definitely need more RAM for such large datasets. You're having ~15 billion k-mers there.

Aug 02 '22 07:08 asl

Any predictions on how much? What is the typical size of metagenomic raw reads used?

Will skipping k-mer 21 drastically reduce the quality of the assembly? As that would reduce the k-mer count by a lot?

Aug 02 '22 07:08 Rob-murphys

There is no way to predict the memory usage before the assembly as it depends on many things including the genome size, repeat content, error rate, strain content, etc.

Skipping shorter k-mers will not reduce the k-mer count in any way. Quite the opposite.

Aug 02 '22 07:08 asl

Skipping shorter k-mers will not reduce the k-mer count in any way. Quite the opposite.

Oh how come that is the case?

Would you suggest now merging these into a co assembly then? As it looks like the amount of memory I will need will be prohibitive for me.

Aug 02 '22 07:08 Rob-murphys

Oh how come that is the case?

Pretty simple. We are talking about the # of distinct k-mers here. Also observe that each sequencing error introduces k non-genomic k-mers.

Would you suggest now merging these into a co assembly then? As it looks like the amount of memory I will need will be prohibitive for me.

What do you want to merge? You're having just a single sample here. Also, merging samples and co-assembling required much more RAM as you'd assemble all data at once.

Aug 02 '22 08:08 asl

What do you want to merge? You're having just a single sample here. Also, merging samples and co-assembling required much more RAM as you'd assemble all data at once.

Sorry that was a typo, I meant not merging.

Aug 02 '22 08:08 Rob-murphys

Well, you could certainly try to merge less and see what will be the outcome

Aug 02 '22 08:08 asl

Pretty simple. We are talking about the # of distinct k-mers here. Also observe that each sequencing error introduces k non-genomic k-mers.

Ah I don't think I understand what is going on with the k-mer stuff then. I understood it as each k-mer length would generate its own set of mers.

Aug 02 '22 08:08 Rob-murphys

Hello, I would like to know if you solved this problem, how did you solve it. Because I'm having the same problem at the moment, I hope to get some advice from you，thanks.

Apr 04 '24 09:04 zfsh

spades spades copied to clipboard

metaSPADES out of memory errors on medium size dataset

Description of bug

spades.log

params.txt

SPAdes version

Operating System

Python Version

Method of SPAdes installation

No errors reported in spades.log

spades
spades copied to clipboard