strobealign icon indicating copy to clipboard operation
strobealign copied to clipboard

about the run memory

Open Wanli-HE opened this issue 3 months ago • 7 comments

Hi folks!

it is really nice tool, and now i have a very larger genome files, about 27GB, and when i try strobealign, I use reads to map this catalog, and the t 100 and run memory is 250GB, but it raises error said out of the memory.

so do you have any suggestion on my case? how should i set the memory?

best, wanli

Wanli-HE avatar Sep 17 '25 10:09 Wanli-HE

Is the genome file size of 27Gb a compressed or uncompressed file?

IIRC, strobealign index takes about 4-5x the genome size. So that would mean 108-135Gb in your case is the genome size is 27 billion bases. Each thread should not add much to memory (at most a couple of megabytes).

Obv., one suggestion is running on a machine with more memory.

ksahlin avatar Sep 17 '25 13:09 ksahlin

Hi, I suggest you run strobealign with the option -v. Then after it has read in the reference FASTA, there will be a line like this:

Estimated total memory usage: 123.45 GB

The actual memory usage may be slightly larger, but it should be a pretty good estimate. (I suggest you then press Ctrl+C as soon as you see that line in order to cancel to prevent strobealign from actually creating the index.)

@ksahlin I will send in a PR to print that line even without -v. (Maybe even a warning if the available memory is less than the shown value.)

marcelm avatar Sep 18 '25 09:09 marcelm

Hi, I suggest you run strobealign with the option -v. Then after it has read in the reference FASTA, there will be a line like this:

Estimated total memory usage: 123.45 GB

The actual memory usage may be slightly larger, but it should be a pretty good estimate. (I suggest you then press Ctrl+C as soon as you see that line in order to cancel to prevent strobealign from actually creating the index.)

@ksahlin I will send in a PR to print that line even without -v. (Maybe even a warning if the available memory is less than the shown value.)

Thanks

Wanli-HE avatar Sep 18 '25 09:09 Wanli-HE

Is the genome file size of 27Gb a compressed or uncompressed file?

IIRC, strobealign index takes about 4-5x the genome size. So that would mean 108-135Gb in your case is the genome size is 27 billion bases. Each thread should not add much to memory (at most a couple of megabytes).

Obv., one suggestion is running on a machine with more memory.

thanks

Wanli-HE avatar Sep 18 '25 09:09 Wanli-HE

Is the genome file size of 27Gb a compressed or uncompressed file?

IIRC, strobealign index takes about 4-5x the genome size. So that would mean 108-135Gb in your case is the genome size is 27 billion bases. Each thread should not add much to memory (at most a couple of megabytes).

Obv., one suggestion is running on a machine with more memory.

Hi again, another question, after creating the index, which mean each time i will use index to map (strobealign --use-index -t 8 ref.fa reads.1.fastq.gz reads.2.fastq.gz | samtools .), do you have any idea about how many memory need used for the mapping? my case is i am doing co-binning, so it is better cat more samples, so i have to estimate the final mapping memory of my machine. and do you have any suggestions on this?

Wanli-HE avatar Sep 26 '25 09:09 Wanli-HE

The amount of memory is the same. There are some ideas for splitting the index into smaller chunks, but that is not implemented, yet. So the above numbers that @ksahlin mentions even apply when you run with --use-index.

marcelm avatar Sep 26 '25 09:09 marcelm

The amount of memory is the same. There are some ideas for splitting the index into smaller chunks, but that is not implemented, yet. So the above numbers that @ksahlin mentions even apply when you run with --use-index.

ok, thanks

Wanli-HE avatar Sep 26 '25 12:09 Wanli-HE