spades
spades copied to clipboard
metaSPADES out of memory errors on medium size dataset
Description of bug
I am running metaSPADES on a co-assembly of 6 samples totalling 19gb raw reads given to metaSPADES. Despite this being a somewhat typical size of metagenomic dataset I get out of memory errors when giving 170 GB of memory (and 16 threads). I know this is a me problem but I am just unsure how to process as that is a huge amount of memory already and it seems crazy that it OOMs.
Do you have any advice on how I could proceed? metaSPADES provides better assemblies that other tools out there so I would like to use it.
I understand there are other factors but is it possible for you to provide usage data similar to how you do for spades in the README?
spades.log
params.txt
SPAdes version
3.14.0
Operating System
CentOS Linux release 7.6.1810 (Core)
Python Version
No response
Method of SPAdes installation
Manual
No errors reported in spades.log
- [ ] Yes
Consider upgrading to the latest SPAdes 3.15.4
The error still occurs on the latest version of SPAdes
@Lamm-a will you please post new spades.log?
This is a slightly different version to the above. The forward and reverse reads are 27G each. I have allocated 178G with 16 cores.
SPAdes.log spades.log
SPAdes version 3.15.3 (not the absolute current so I am re running)
Yes, you definitely need more RAM for such large datasets. You're having ~15 billion k-mers there.
Any predictions on how much? What is the typical size of metagenomic raw reads used?
Will skipping k-mer 21 drastically reduce the quality of the assembly? As that would reduce the k-mer count by a lot?
There is no way to predict the memory usage before the assembly as it depends on many things including the genome size, repeat content, error rate, strain content, etc.
Skipping shorter k-mers will not reduce the k-mer count in any way. Quite the opposite.
Skipping shorter k-mers will not reduce the k-mer count in any way. Quite the opposite.
Oh how come that is the case?
Would you suggest now merging these into a co assembly then? As it looks like the amount of memory I will need will be prohibitive for me.
Oh how come that is the case?
Pretty simple. We are talking about the # of distinct k-mers here. Also observe that each sequencing error introduces k
non-genomic k-mers.
Would you suggest now merging these into a co assembly then? As it looks like the amount of memory I will need will be prohibitive for me.
What do you want to merge? You're having just a single sample here. Also, merging samples and co-assembling required much more RAM as you'd assemble all data at once.
What do you want to merge? You're having just a single sample here. Also, merging samples and co-assembling required much more RAM as you'd assemble all data at once.
Sorry that was a typo, I meant not merging.
Well, you could certainly try to merge less and see what will be the outcome
Pretty simple. We are talking about the # of distinct k-mers here. Also observe that each sequencing error introduces k non-genomic k-mers.
Ah I don't think I understand what is going on with the k-mer stuff then. I understood it as each k-mer length would generate its own set of mers.
Hello, I would like to know if you solved this problem, how did you solve it. Because I'm having the same problem at the moment, I hope to get some advice from you,thanks.