ggcat icon indicating copy to clipboard operation
ggcat copied to clipboard

memory allocation failed

Open jermp opened this issue 3 months ago • 4 comments

Hi, I'm using GGCAT for building eulertigs. So far, I had no problem and successfully built eulertigs for large collections.

But I noticed that on the whole 661k "Blackwell" collection, as well as on this collection of fungi https://zenodo.org/records/17093970 which counts 1624 genomes, the computation gets aborted at its final stage with the same message:

Started phase: eulertigs building [step1]
memory allocation of 309237645312 bytes failed
Aborted (core dumped)

It seems that the algorithm is trying to allocate >300GB in RAM in both scenarios, so this looks suspicious to me. But isn't the the max RAM usage capped with option -m?

This is the command I'm using, just for reference:

ggcat build -k 31 -j 64 -l ~/jgi_fungi_filenames.txt -s 1 --eulertigs -o jgi_fungi.k31.eulertigs.fa -t tmp_dir -m 64

hence, 64 parallel threads and 64 GB of RAM.

In any case, I also have ~500GB of RAM on the test machine I'm using, so I don't know why that message is showing up and causes the crash.

Any help very much appreciated. thanks!

Best, -Giulio

jermp avatar Oct 30 '25 09:10 jermp

I forgot to mention that I'm using ggcat_cmdline 2.0.0.

(I retried the previous command with less threads and less memory but I got the same error.)

jermp avatar Oct 30 '25 10:10 jermp

Oh, I now see that the process is taking far more memory than I expected: already 350G out of my 500G available. Is this normal? I thought the -m parameter was there to avoid this problem, but I notice you wrote "This usage does not include the needed memory for the processing steps."...

jermp avatar Oct 30 '25 13:10 jermp

Hi Giulio, I was able to reproduce the bug also in the latest version, so it's definitely a thing to fix. I suspect it's caused by some deserialization error while decoding a sequence length, that causes the big allocation you pointed out. I will try to fix it as soon as I can

Guilucand avatar Nov 09 '25 14:11 Guilucand

Thank you Andrea!

jermp avatar Nov 09 '25 14:11 jermp

Hi Giulio, I fixed some bugs in dev and tested the construction, now it works on my machine. Can you test it? Thanks, Andrea

Guilucand avatar Nov 23 '25 18:11 Guilucand

Thank you Andrea. How should I build the project now that I'm on the dev branch? I tried cargo build and cargo install --path crates/cmdline/ --locked as described in the README but both fail.

jermp avatar Nov 23 '25 20:11 jermp

Ok, I managed to build it :) I needed to first rustup update.

jermp avatar Nov 23 '25 20:11 jermp

Hi Andrea, on the entire Blackwell 661k, it failed again with the same error message:

started phase: maximal unitigs links building [step 3]
memory allocation of 154618822656 bytes failed
Aborted (core dumped)

Even if large, I have 0.5TB of RAM, so the allocation should be possibile.

jermp avatar Nov 24 '25 13:11 jermp