bcalm
bcalm copied to clipboard
Memory error with many reference genomes as input
I'm trying to run BCALM on a very large database and I keep getting errors that look like:
mem: [ 100, 91622, 92822] MB EXCEPTION: Pool reserve() failed for 576716800000 (+ 1600 extra for 36 cores) bytes
Is there some way to get around this? More/fewer cores? Lower max-memory (I don't have much more RAM available)? Large value for -max-disk
parameter?
Thanks!
how many distinct kmers reported at the beginning of the output?
I don't see k-mers reported in the std output. In the .hf5 file there's a field that says estimated k-mer number: 69880669047 - just under 70 billion. (k=31, would trying a lower number help?)
can you please paste the full bcalm output? also what's the genome size / fastq file size? Bcalm on 70 billion distinct kmers is no small feat (what organism is that?) and 92 GB ram would be understandably low..
I've pasted it below.
It's for a metagenomic simulation - the concatenated reference genomes are ~70Gbp, so this matches the k-mer count assuming almost all are unique.
From what I understood the error occurred with ~550GB RAM which is what I passed to the -max-memory
flag
[Approximating frequencies of minimizers ] 100 % elapsed: 5 min 1 sec remaining: 0 min 0 sec cpu: 100.0 % mem: [ 321, 321, 516] MB
[DSK: Collecting stats on all_download...] 100 % elapsed: 0 min 2 sec remaining: 0 min 0 sec cpu: 100.0 % mem: [ 92, 92, 516] MB
[DSK: Pass 1/1, Step 1: partitioning ] 20 % elapsed: 10 min 38 sec remaining: 42 min 30 sec cpu: 1589.0 % mem: [100293, 100293, 100293]
[DSK: Pass 1/1, Step 1: partitioning ] 21 % elapsed: 11 min 4 sec remaining: 41 min 37 sec cpu: 1601.2 % mem: [101318, 101318, 101318]
[DSK: Pass 1/1, Step 1: partitioning ] 22 % elapsed: 11 min 30 sec remaining: 40 min 46 sec cpu: 1612.5 % mem: [102247, 102247, 102247]
[DSK: Pass 1/1, Step 1: partitioning ] 23 % elapsed: 11 min 56 sec remaining: 39 min 57 sec cpu: 1623.0 % mem: [103117, 103117, 103117]
[DSK: Pass 1/1, Step 1: partitioning ] 24 % elapsed: 12 min 22 sec remaining: 39 min 9 sec cpu: 1632.6 % mem: [103934, 103934, 103934]
[DSK: Pass 1/1, Step 1: partitioning ] 25 % elapsed: 12 min 48 sec remaining: 38 min 24 sec cpu: 1641.8 % mem: [104709, 104709, 104709]
[DSK: Pass 1/1, Step 1: partitioning ] 26 % elapsed: 13 min 14 sec remaining: 37 min 41 sec cpu: 1650.3 % mem: [105418, 105418, 105418]
[DSK: Pass 1/1, Step 1: partitioning ] 27 % elapsed: 13 min 41 sec remaining: 36 min 58 sec cpu: 1658.2 % mem: [106074, 106074, 106074]
[DSK: Pass 1/1, Step 1: partitioning ] 28 % elapsed: 14 min 7 sec remaining: 36 min 17 sec cpu: 1665.7 % mem: [106658, 106658, 106658]
[DSK: Pass 1/1, Step 1: partitioning ] 29 % elapsed: 14 min 33 sec remaining: 35 min 37 sec cpu: 1672.7 % mem: [107197, 107197, 107197]
[DSK: Pass 1/1, Step 1: partitioning ] 30 % elapsed: 14 min 59 sec remaining: 34 min 58 sec cpu: 1679.3 % mem: [107646, 107646, 107646]
[DSK: Pass 1/1, Step 1: partitioning ] 31 % elapsed: 15 min 25 sec remaining: 34 min 19 sec cpu: 1685.5 % mem: [108080, 108080, 108080]
[DSK: Pass 1/1, Step 1: partitioning ] 32 % elapsed: 15 min 51 sec remaining: 33 min 41 sec cpu: 1691.4 % mem: [108456, 108456, 108456]
[DSK: Pass 1/1, Step 1: partitioning ] 33 % elapsed: 16 min 18 sec remaining: 33 min 5 sec cpu: 1697.0 % mem: [108779, 108779, 108779]
[DSK: Pass 1/1, Step 1: partitioning ] 34 % elapsed: 16 min 43 sec remaining: 32 min 28 sec cpu: 1702.2 % mem: [109079, 109079, 109079]
[DSK: Pass 1/1, Step 1: partitioning ] 35 % elapsed: 17 min 10 sec remaining: 31 min 52 sec cpu: 1707.2 % mem: [109336, 109336, 109336]
[DSK: Pass 1/1, Step 1: partitioning ] 36 % elapsed: 17 min 36 sec remaining: 31 min 17 sec cpu: 1712.0 % mem: [109566, 109566, 109566]
[DSK: Pass 1/1, Step 1: partitioning ] 37 % elapsed: 18 min 2 sec remaining: 30 min 42 sec cpu: 1716.5 % mem: [109763, 109763, 109763]
[DSK: Pass 1/1, Step 1: partitioning ] 38 % elapsed: 18 min 28 sec remaining: 30 min 8 sec cpu: 1720.8 % mem: [109920, 109920, 109920]
[DSK: Pass 1/1, Step 1: partitioning ] 39 % elapsed: 18 min 54 sec remaining: 29 min 34 sec cpu: 1724.9 % mem: [110064, 110064, 110064]
[DSK: Pass 1/1, Step 1: partitioning ] 40 % elapsed: 19 min 20 sec remaining: 29 min 0 sec cpu: 1728.8 % mem: [110191, 110191, 110191]
[DSK: Pass 1/1, Step 1: partitioning ] 41 % elapsed: 19 min 48 sec remaining: 28 min 30 sec cpu: 1729.1 % mem: [102528, 110191, 110194]
[DSK: Pass 1/1, Step 1: partitioning ] 42 % elapsed: 20 min 19 sec remaining: 28 min 3 sec cpu: 1726.8 % mem: [97109, 110191, 110194] $
[DSK: Pass 1/1, Step 1: partitioning ] 43 % elapsed: 20 min 50 sec remaining: 27 min 37 sec cpu: 1723.6 % mem: [97199, 110191, 110194] $
[DSK: Pass 1/1, Step 1: partitioning ] 44 % elapsed: 21 min 21 sec remaining: 27 min 10 sec cpu: 1720.6 % mem: [97268, 110191, 110194] M
[DSK: Pass 1/1, Step 1: partitioning ] 45 % elapsed: 21 min 54 sec remaining: 26 min 46 sec cpu: 1715.4 % mem: [91628, 110191, 110194] M
[DSK: Pass 1/1, Step 1: partitioning ] 46 % elapsed: 22 min 30 sec remaining: 26 min 25 sec cpu: 1706.1 % mem: [80606, 110191, 110194] M
[DSK: Pass 1/1, Step 1: partitioning ] 47 % elapsed: 23 min 8 sec remaining: 26 min 5 sec cpu: 1695.0 % mem: [80651, 110191, 110194] M
[DSK: Pass 1/1, Step 1: partitioning ] 48 % elapsed: 23 min 53 sec remaining: 25 min 52 sec cpu: 1677.5 % mem: [57260, 110191, 110194] M
[DSK: Pass 1/1, Step 1: partitioning ] 49 % elapsed: 24 min 56 sec remaining: 25 min 57 sec cpu: 1639.9 % mem: [45398, 110191, 110194] M
[DSK: Pass 1/1, Step 2: counting kmers ] 50 % elapsed: 27 min 38 sec remaining: 27 min 38 sec cpu: 1509.7 % mem: [ 99, 110191, 110194] MB
EXCEPTION: Pool reserve() failed for 629145600000 (+ 1600 extra for 36 cores) bytes
ahh thanks, there is something really odd. What does the input look like? A multifasta file where each entry is a reference genome? What's the longest sequence in there?
Yes exactly - it's a multifasta of reference genomes. The longest is 16Mbp
that sounds quite reasonable but I never tried that sort of counting. The k-mer counter is really optimized for counting reads, or single reference genomes, but not really collections of reference genomes of various lengths, and probably its guess for internal memory allocation parameters is thrown off by the various genome sizes. What -max-memory
value did you pass, 500000
?
Yes, I've tried 500GB-650GB, and also various values for max-disk. Are there parameters that you think would get it to work? Or another assembler that you would recommend?
Yeah, I'm sorry it didn't work, I could look into it sometimes. Meanwhile you could give https://github.com/pmelsted/bifrost a try for outputting unitigs!
Thanks
I'm leaving this open, would like to make bcalm able to process collections of genomes
OK - It could be a good idea as a number of recent k-mer indexing/querying tools recommend using bcalm, not necessarily in the usual context of assembly of sequencing reads. https://github.com/kamimrcht/REINDEER https://github.com/Malfoy/Blight
REINDEER is not limited by the bug you point out, as it makes BCALM process one genome (= one color) after the other, i.e. smaller chunks at a time.
David, This may be a use-case for TwoPaCo:
https://github.com/medvedevgroup/TwoPaCo
David, This may be a use-case for TwoPaCo:
Thanks! I'll look into it.
I may have identified the issue with BCALM taking too much memory with many ref genomes. It can be temp-fixed by doing this change.
diff --git a/gatb-core/src/gatb/kmer/impl/SortingCountAlgorithm.cpp b/gatb-core/src/gatb/kmer/impl/SortingCountAlgorithm.cpp
index c87c6f8..3e05d36 100644
--- a/gatb-core/src/gatb/kmer/impl/SortingCountAlgorithm.cpp
+++ b/gatb-core/src/gatb/kmer/impl/SortingCountAlgorithm.cpp
@@ -1231,7 +1231,7 @@ void SortingCountAlgorithm<span>::fillPartitions (size_t pass, Iterator<Sequence
/** We launch the iteration of the sequences iterator with the created
* functors. */
- size_t groupSize = 1000;
+ size_t groupSize = 10;
bool deleteSynchro = true;
(reduces mem by 5x on my end) Will further study a more robust change, but meanwhile @dpellow you could try it if you'd like.