about the run memory
Hi folks!
it is really nice tool, and now i have a very larger genome files, about 27GB, and when i try strobealign, I use reads to map this catalog, and the t 100 and run memory is 250GB, but it raises error said out of the memory.
so do you have any suggestion on my case? how should i set the memory?
best, wanli
Is the genome file size of 27Gb a compressed or uncompressed file?
IIRC, strobealign index takes about 4-5x the genome size. So that would mean 108-135Gb in your case is the genome size is 27 billion bases. Each thread should not add much to memory (at most a couple of megabytes).
Obv., one suggestion is running on a machine with more memory.
Hi, I suggest you run strobealign with the option -v. Then after it has read in the reference FASTA, there will be a line like this:
Estimated total memory usage: 123.45 GB
The actual memory usage may be slightly larger, but it should be a pretty good estimate. (I suggest you then press Ctrl+C as soon as you see that line in order to cancel to prevent strobealign from actually creating the index.)
@ksahlin I will send in a PR to print that line even without -v. (Maybe even a warning if the available memory is less than the shown value.)
Hi, I suggest you run
strobealignwith the option-v. Then after it has read in the reference FASTA, there will be a line like this:Estimated total memory usage: 123.45 GBThe actual memory usage may be slightly larger, but it should be a pretty good estimate. (I suggest you then press Ctrl+C as soon as you see that line in order to cancel to prevent strobealign from actually creating the index.)
@ksahlin I will send in a PR to print that line even without
-v. (Maybe even a warning if the available memory is less than the shown value.)
Thanks
Is the genome file size of 27Gb a compressed or uncompressed file?
IIRC, strobealign index takes about 4-5x the genome size. So that would mean 108-135Gb in your case is the genome size is 27 billion bases. Each thread should not add much to memory (at most a couple of megabytes).
Obv., one suggestion is running on a machine with more memory.
thanks
Is the genome file size of 27Gb a compressed or uncompressed file?
IIRC, strobealign index takes about 4-5x the genome size. So that would mean 108-135Gb in your case is the genome size is 27 billion bases. Each thread should not add much to memory (at most a couple of megabytes).
Obv., one suggestion is running on a machine with more memory.
Hi again, another question, after creating the index, which mean each time i will use index to map (strobealign --use-index -t 8 ref.fa reads.1.fastq.gz reads.2.fastq.gz | samtools .), do you have any idea about how many memory need used for the mapping? my case is i am doing co-binning, so it is better cat more samples, so i have to estimate the final mapping memory of my machine. and do you have any suggestions on this?
The amount of memory is the same. There are some ideas for splitting the index into smaller chunks, but that is not implemented, yet. So the above numbers that @ksahlin mentions even apply when you run with --use-index.
The amount of memory is the same. There are some ideas for splitting the index into smaller chunks, but that is not implemented, yet. So the above numbers that @ksahlin mentions even apply when you run with
--use-index.
ok, thanks