minimap2
minimap2 copied to clipboard
[morecore] insufficient memory
Hi,
I used minimap2 (2.17-r941). The NbV1ChF.fasta
and ragoo.fasta
are 2.8 G and 3.4 G. However, I ran out of memory on 2 TB machine.
minimap2 -ax asm5 --cs -t8 ../../NbV1ChF.fasta ../ragoo.fasta > pm_against_ref.sam 2> pm_contigs_against_ref.sam.log
/bin/bash: line 1: 10389 Aborted (core dumped) minimap2 -ax asm5 --cs -t8 ../../NbV1ChF.fasta ../ragoo.fasta > pm_against_ref.sam 2> pm_contigs_against_ref.sam.log
less pm_contigs_against_ref.sam.log
[M::mm_idx_gen::55.511*1.79] collected minimizers
[M::mm_idx_gen::60.192*2.17] sorted minimizers
[M::main::60.192*2.17] loaded/built the index for 19 target sequence(s)
[M::mm_mapopt_update::64.517*2.10] mid_occ = 342
[M::mm_idx_stat] kmer size: 19; skip: 19; is_hpc: 0; #seq: 19
[M::mm_idx_stat::69.857*2.01] distinct minimizers: 143929053 (83.21% are singletons); average occurrences: 1.928; average spacing: 9.884
[M::worker_pipeline::3024.805*3.40] mapped 5 sequences
[M::worker_pipeline::4921.171*3.81] mapped 6 sequences
[M::worker_pipeline::7761.560*3.75] mapped 4 sequences
[morecore] insufficient memory
What could cause such a high memory consumption? How much more memory do I need?
Thank you in advance,
Michal
That may be caused by integer overflow. I may need to reproduce the issue to fix it...
I am seeing a similar issue with a reference fasta that is 1.1Gb and reads around 2Gb. However, I am not seeing the [morecore] insufficient memory
error, I am just getting kicked off my interactive job, or the job is failing, hitting memory limits in excess of 24Gb.
Of possible note, my reference file does have some ambiguous bases (N) in it. Not sure if that makes a difference?
I am using version 2.15-r905
Example command I am using is
minimap2 -aLx map-ont -t 8 ref.fasta reads.fastq.gz > out.sam
Dropping the -L
option doesn't seem to make a difference either.
For whole mammal genome alignment, using >20Gb memory is expected especially when you have chromosome assemblies and request many threads. However, there must be something going wrong if 2TB is still "insufficient".
In my use case, the reference is a 16S rRNA database (sequences all around 1.5Kb) and the reads are also from 16S rRNA sequencing (reads have been filtered to 1.4-1.6Kb). All cDNA.
For one or two of the samples, I was still hitting memory limits with 64Gb of memory.
High memory usage is expected given the repetitiveness of the reference data.
@mbhall88 you might want to check if your account is limited in RAM quote too:
% ulimit -a
data seg size (kbytes, -d) unlimited
max locked memory (kbytes, -l) unlimited
max memory size (kbytes, -m) unlimited
etc
@mbhall88 are mapping to "pick" OTUs or classify (taxonomy) reads?
You may want to try using the feature-classifier within q2. You can train the classifier using information based on your specific sample preparation (e.g., primers used etc). More information here: https://docs.qiime2.org/2019.7/data-resources/
High memory usage is expected, given the repetitiveness of the reference data.
Another example. I have a sample that is (compressed) 311MB in size, mapping to the database I previously mentioned, and 150Gb RAM is still not enough (same for 10s of other samples of the same size). This is the stderr I get
[M::mm_idx_gen::30.581*1.57] collected minimizers
[M::mm_idx_gen::33.309*2.16] sorted minimizers
[M::main::33.486*2.15] loaded/built the index for 695171 target sequence(s)
[M::mm_mapopt_update::33.701*2.14] mid_occ = 16535
[M::mm_idx_stat] kmer size: 15; skip: 10; is_hpc: 0; #seq: 695171
[M::mm_idx_stat::33.842*2.14] distinct minimizers: 7165551 (60.20% are singletons); average occurrences: 25.429; average spacing: 5.537
@lh3 if you are saying that this high memory is expected then that's fine, but I think you should clarify this in the documentation because that is crazy high memory usage for mapping a small sample to a seemingly small reference database.
[M::mm_mapopt_update::33.701*2.14] mid_occ = 16535
The repetitiveness is really high here. The peak memory will depend on what sequences are being mapped at the same time across threads. It can vary a lot. One way is to reduce the number of threads.
We're seeing something similar with one of our PhD students too. She's aligning a ~1.6 GB FASTQ to a set of 16S references (n=~21000, 50Mb). The job slowly but surely consumes the 64 GB of RAM on her machine and then gets killed by the OOM killer. We've overcome it temporarily by setting the minibatch size to something lower than the 500M
default with -K
. I tried to diagnose by running minimap2 on one thread, the strace
of the child alignment process shows it's just running mprotect
and mmap
until all the RAM is consumed.
The repetitiveness is really high here. The peak memory will depend on what sequences are being mapped at the same time across threads. It can vary a lot. One way is to reduce the number of threads.
I tried running with a single thread and all 20 samples still use more than 150Gb memory.... @
We're seeing something similar with one of our PhD students too. She's aligning a ~1.6 GB FASTQ to a set of 16S references (n=~21000, 50Mb). The job slowly but surely consumes the 64 GB of RAM on her machine and then gets killed by the OOM killer. We've overcome it temporarily by setting the minibatch size to something lower than the
500M
default with-K
. I tried to diagnose by running minimap2 on one thread, thestrace
of the child alignment process shows it's just runningmprotect
andmmap
until all the RAM is consumed.
I will give the minibatch approach a go. Thanks @SamStudio8
Also, could you try --no-kalloc --print-qname
? Thanks.
I was able to get 20/22 samples to complete within 100Gb RAM with the following options
minimap2 -K 100M --no-kalloc --print-qname -aLx map-ont -t 1 <target> <query>
I am trying -K 50M
to see if I can get the last two to finish.
I think this is related to https://github.com/lh3/minigraph/issues/7.
I am trying to align two chromosomes which are both ~800Mb.
Testing with K100 now.
Crashed again with Segmentation fault (core dumped).
Ok, so to get my samples to all run in under 50Gb RAM I had to use the following options
minimap2 -t 1 -K 25M --no-kalloc --print-qname -aLx map-ont <target> <query>
Although I don't necessarily think the --print-qname
option makes an impact on the memory footprint.
I met the same error too. I want to do a alignment between two genome. The command is following: minimap2 -ax asm5 --cs -t 20 $ref $query > query2ref.sam
And the size of my genome is about 960 Mb.
I do not know how to address it. Is there anyone solve it.
I met the same error too, when doing self mapping, the size of genome is 850M.
/minimap2 -t 4 -x asm5 -o Bch.Bch.paf Bch.genome.fa Bch.genome.fa
minimap2 v2.26-r1175
Best, Kun