minimap2 icon indicating copy to clipboard operation
minimap2 copied to clipboard

[morecore] insufficient memory

Open mictadlo opened this issue 4 years ago • 19 comments

Hi, I used minimap2 (2.17-r941). The NbV1ChF.fasta and ragoo.fasta are 2.8 G and 3.4 G. However, I ran out of memory on 2 TB machine.

minimap2 -ax asm5 --cs -t8 ../../NbV1ChF.fasta ../ragoo.fasta > pm_against_ref.sam 2> pm_contigs_against_ref.sam.log
/bin/bash: line 1: 10389 Aborted                 (core dumped) minimap2 -ax asm5 --cs -t8 ../../NbV1ChF.fasta ../ragoo.fasta > pm_against_ref.sam 2> pm_contigs_against_ref.sam.log
less pm_contigs_against_ref.sam.log 
[M::mm_idx_gen::55.511*1.79] collected minimizers
[M::mm_idx_gen::60.192*2.17] sorted minimizers
[M::main::60.192*2.17] loaded/built the index for 19 target sequence(s)
[M::mm_mapopt_update::64.517*2.10] mid_occ = 342
[M::mm_idx_stat] kmer size: 19; skip: 19; is_hpc: 0; #seq: 19
[M::mm_idx_stat::69.857*2.01] distinct minimizers: 143929053 (83.21% are singletons); average occurrences: 1.928; average spacing: 9.884
[M::worker_pipeline::3024.805*3.40] mapped 5 sequences
[M::worker_pipeline::4921.171*3.81] mapped 6 sequences
[M::worker_pipeline::7761.560*3.75] mapped 4 sequences
[morecore] insufficient memory

What could cause such a high memory consumption? How much more memory do I need?

Thank you in advance,

Michal

mictadlo avatar Oct 28 '19 22:10 mictadlo

That may be caused by integer overflow. I may need to reproduce the issue to fix it...

lh3 avatar Oct 28 '19 23:10 lh3

I am seeing a similar issue with a reference fasta that is 1.1Gb and reads around 2Gb. However, I am not seeing the [morecore] insufficient memory error, I am just getting kicked off my interactive job, or the job is failing, hitting memory limits in excess of 24Gb.

Of possible note, my reference file does have some ambiguous bases (N) in it. Not sure if that makes a difference?

I am using version 2.15-r905

Example command I am using is

minimap2 -aLx map-ont -t 8 ref.fasta reads.fastq.gz > out.sam

Dropping the -L option doesn't seem to make a difference either.

mbhall88 avatar Oct 31 '19 18:10 mbhall88

For whole mammal genome alignment, using >20Gb memory is expected especially when you have chromosome assemblies and request many threads. However, there must be something going wrong if 2TB is still "insufficient".

lh3 avatar Oct 31 '19 18:10 lh3

In my use case, the reference is a 16S rRNA database (sequences all around 1.5Kb) and the reads are also from 16S rRNA sequencing (reads have been filtered to 1.4-1.6Kb). All cDNA.

For one or two of the samples, I was still hitting memory limits with 64Gb of memory.

mbhall88 avatar Oct 31 '19 18:10 mbhall88

High memory usage is expected given the repetitiveness of the reference data.

lh3 avatar Oct 31 '19 18:10 lh3

@mbhall88 you might want to check if your account is limited in RAM quote too:

% ulimit -a

data seg size           (kbytes, -d) unlimited
max locked memory       (kbytes, -l) unlimited
max memory size         (kbytes, -m) unlimited
etc

tseemann avatar Oct 31 '19 20:10 tseemann

@mbhall88 are mapping to "pick" OTUs or classify (taxonomy) reads?

You may want to try using the feature-classifier within q2. You can train the classifier using information based on your specific sample preparation (e.g., primers used etc). More information here: https://docs.qiime2.org/2019.7/data-resources/

andrem01 avatar Nov 01 '19 00:11 andrem01

High memory usage is expected, given the repetitiveness of the reference data.

Another example. I have a sample that is (compressed) 311MB in size, mapping to the database I previously mentioned, and 150Gb RAM is still not enough (same for 10s of other samples of the same size). This is the stderr I get

[M::mm_idx_gen::30.581*1.57] collected minimizers
[M::mm_idx_gen::33.309*2.16] sorted minimizers
[M::main::33.486*2.15] loaded/built the index for 695171 target sequence(s)
[M::mm_mapopt_update::33.701*2.14] mid_occ = 16535
[M::mm_idx_stat] kmer size: 15; skip: 10; is_hpc: 0; #seq: 695171
[M::mm_idx_stat::33.842*2.14] distinct minimizers: 7165551 (60.20% are singletons); average occurrences: 25.429; average spacing: 5.537

@lh3 if you are saying that this high memory is expected then that's fine, but I think you should clarify this in the documentation because that is crazy high memory usage for mapping a small sample to a seemingly small reference database.

mbhall88 avatar Nov 01 '19 11:11 mbhall88

[M::mm_mapopt_update::33.701*2.14] mid_occ = 16535

The repetitiveness is really high here. The peak memory will depend on what sequences are being mapped at the same time across threads. It can vary a lot. One way is to reduce the number of threads.

lh3 avatar Nov 01 '19 12:11 lh3

We're seeing something similar with one of our PhD students too. She's aligning a ~1.6 GB FASTQ to a set of 16S references (n=~21000, 50Mb). The job slowly but surely consumes the 64 GB of RAM on her machine and then gets killed by the OOM killer. We've overcome it temporarily by setting the minibatch size to something lower than the 500M default with -K. I tried to diagnose by running minimap2 on one thread, the strace of the child alignment process shows it's just running mprotect and mmap until all the RAM is consumed.

SamStudio8 avatar Nov 04 '19 11:11 SamStudio8

The repetitiveness is really high here. The peak memory will depend on what sequences are being mapped at the same time across threads. It can vary a lot. One way is to reduce the number of threads.

I tried running with a single thread and all 20 samples still use more than 150Gb memory.... @

mbhall88 avatar Nov 04 '19 14:11 mbhall88

We're seeing something similar with one of our PhD students too. She's aligning a ~1.6 GB FASTQ to a set of 16S references (n=~21000, 50Mb). The job slowly but surely consumes the 64 GB of RAM on her machine and then gets killed by the OOM killer. We've overcome it temporarily by setting the minibatch size to something lower than the 500M default with -K. I tried to diagnose by running minimap2 on one thread, the strace of the child alignment process shows it's just running mprotect and mmap until all the RAM is consumed.

I will give the minibatch approach a go. Thanks @SamStudio8

mbhall88 avatar Nov 04 '19 14:11 mbhall88

Also, could you try --no-kalloc --print-qname? Thanks.

lh3 avatar Nov 04 '19 14:11 lh3

I was able to get 20/22 samples to complete within 100Gb RAM with the following options

minimap2 -K 100M --no-kalloc --print-qname -aLx map-ont -t 1 <target> <query>

I am trying -K 50M to see if I can get the last two to finish.

mbhall88 avatar Nov 05 '19 11:11 mbhall88

I think this is related to https://github.com/lh3/minigraph/issues/7.

I am trying to align two chromosomes which are both ~800Mb.

Testing with K100 now.

fbemm avatar Nov 06 '19 10:11 fbemm

Crashed again with Segmentation fault (core dumped).

fbemm avatar Nov 08 '19 11:11 fbemm

Ok, so to get my samples to all run in under 50Gb RAM I had to use the following options

minimap2 -t 1 -K 25M --no-kalloc --print-qname -aLx map-ont <target> <query>

Although I don't necessarily think the --print-qname option makes an impact on the memory footprint.

mbhall88 avatar Nov 08 '19 17:11 mbhall88

I met the same error too. I want to do a alignment between two genome. The command is following: minimap2 -ax asm5 --cs -t 20 $ref $query > query2ref.sam

And the size of my genome is about 960 Mb.

I do not know how to address it. Is there anyone solve it.

shehongbing avatar Dec 26 '20 12:12 shehongbing

I met the same error too, when doing self mapping, the size of genome is 850M.

/minimap2 -t 4 -x asm5 -o Bch.Bch.paf Bch.genome.fa Bch.genome.fa

minimap2 v2.26-r1175

Best, Kun

xiekunwhy avatar Apr 27 '24 06:04 xiekunwhy