spades icon indicating copy to clipboard operation
spades copied to clipboard

metaSPADES out of memory errors on medium size dataset

Open Rob-murphys opened this issue 2 years ago • 13 comments

Description of bug

I am running metaSPADES on a co-assembly of 6 samples totalling 19gb raw reads given to metaSPADES. Despite this being a somewhat typical size of metagenomic dataset I get out of memory errors when giving 170 GB of memory (and 16 threads). I know this is a me problem but I am just unsure how to process as that is a huge amount of memory already and it seems crazy that it OOMs.

Do you have any advice on how I could proceed? metaSPADES provides better assemblies that other tools out there so I would like to use it.

I understand there are other factors but is it possible for you to provide usage data similar to how you do for spades in the README?

spades.log

spades.log

params.txt

params.txt

SPAdes version

3.14.0

Operating System

CentOS Linux release 7.6.1810 (Core)

Python Version

No response

Method of SPAdes installation

Manual

No errors reported in spades.log

  • [ ] Yes

Rob-murphys avatar Jun 15 '22 15:06 Rob-murphys

Consider upgrading to the latest SPAdes 3.15.4

asl avatar Jun 15 '22 16:06 asl

The error still occurs on the latest version of SPAdes

Rob-murphys avatar Aug 01 '22 13:08 Rob-murphys

@Lamm-a will you please post new spades.log?

asl avatar Aug 01 '22 14:08 asl

This is a slightly different version to the above. The forward and reverse reads are 27G each. I have allocated 178G with 16 cores.

SPAdes.log spades.log

SPAdes version 3.15.3 (not the absolute current so I am re running)

Rob-murphys avatar Aug 02 '22 07:08 Rob-murphys

Yes, you definitely need more RAM for such large datasets. You're having ~15 billion k-mers there.

asl avatar Aug 02 '22 07:08 asl

Any predictions on how much? What is the typical size of metagenomic raw reads used?

Will skipping k-mer 21 drastically reduce the quality of the assembly? As that would reduce the k-mer count by a lot?

Rob-murphys avatar Aug 02 '22 07:08 Rob-murphys

There is no way to predict the memory usage before the assembly as it depends on many things including the genome size, repeat content, error rate, strain content, etc.

Skipping shorter k-mers will not reduce the k-mer count in any way. Quite the opposite.

asl avatar Aug 02 '22 07:08 asl

Skipping shorter k-mers will not reduce the k-mer count in any way. Quite the opposite.

Oh how come that is the case?

Would you suggest now merging these into a co assembly then? As it looks like the amount of memory I will need will be prohibitive for me.

Rob-murphys avatar Aug 02 '22 07:08 Rob-murphys

Oh how come that is the case?

Pretty simple. We are talking about the # of distinct k-mers here. Also observe that each sequencing error introduces k non-genomic k-mers.

Would you suggest now merging these into a co assembly then? As it looks like the amount of memory I will need will be prohibitive for me.

What do you want to merge? You're having just a single sample here. Also, merging samples and co-assembling required much more RAM as you'd assemble all data at once.

asl avatar Aug 02 '22 08:08 asl

What do you want to merge? You're having just a single sample here. Also, merging samples and co-assembling required much more RAM as you'd assemble all data at once.

Sorry that was a typo, I meant not merging.

Rob-murphys avatar Aug 02 '22 08:08 Rob-murphys

Well, you could certainly try to merge less and see what will be the outcome

asl avatar Aug 02 '22 08:08 asl

Pretty simple. We are talking about the # of distinct k-mers here. Also observe that each sequencing error introduces k non-genomic k-mers.

Ah I don't think I understand what is going on with the k-mer stuff then. I understood it as each k-mer length would generate its own set of mers.

Rob-murphys avatar Aug 02 '22 08:08 Rob-murphys

Hello, I would like to know if you solved this problem, how did you solve it. Because I'm having the same problem at the moment, I hope to get some advice from you,thanks.

zfsh avatar Apr 04 '24 09:04 zfsh