minimap2 [morecore] insufficient memory

trafficstars

Hi, I used minimap2 (2.17-r941). The NbV1ChF.fasta and ragoo.fasta are 2.8 G and 3.4 G. However, I ran out of memory on 2 TB machine.

minimap2 -ax asm5 --cs -t8 ../../NbV1ChF.fasta ../ragoo.fasta > pm_against_ref.sam 2> pm_contigs_against_ref.sam.log
/bin/bash: line 1: 10389 Aborted                 (core dumped) minimap2 -ax asm5 --cs -t8 ../../NbV1ChF.fasta ../ragoo.fasta > pm_against_ref.sam 2> pm_contigs_against_ref.sam.log

less pm_contigs_against_ref.sam.log 
[M::mm_idx_gen::55.511*1.79] collected minimizers
[M::mm_idx_gen::60.192*2.17] sorted minimizers
[M::main::60.192*2.17] loaded/built the index for 19 target sequence(s)
[M::mm_mapopt_update::64.517*2.10] mid_occ = 342
[M::mm_idx_stat] kmer size: 19; skip: 19; is_hpc: 0; #seq: 19
[M::mm_idx_stat::69.857*2.01] distinct minimizers: 143929053 (83.21% are singletons); average occurrences: 1.928; average spacing: 9.884
[M::worker_pipeline::3024.805*3.40] mapped 5 sequences
[M::worker_pipeline::4921.171*3.81] mapped 6 sequences
[M::worker_pipeline::7761.560*3.75] mapped 4 sequences
[morecore] insufficient memory

What could cause such a high memory consumption? How much more memory do I need?

Thank you in advance,

Michal

Oct 28 '19 22:10 mictadlo

That may be caused by integer overflow. I may need to reproduce the issue to fix it...

Oct 28 '19 23:10 lh3

I am seeing a similar issue with a reference fasta that is 1.1Gb and reads around 2Gb. However, I am not seeing the [morecore] insufficient memory error, I am just getting kicked off my interactive job, or the job is failing, hitting memory limits in excess of 24Gb.

Of possible note, my reference file does have some ambiguous bases (N) in it. Not sure if that makes a difference?

I am using version 2.15-r905

Example command I am using is

minimap2 -aLx map-ont -t 8 ref.fasta reads.fastq.gz > out.sam

Dropping the -L option doesn't seem to make a difference either.

Oct 31 '19 18:10 mbhall88

For whole mammal genome alignment, using >20Gb memory is expected especially when you have chromosome assemblies and request many threads. However, there must be something going wrong if 2TB is still "insufficient".

Oct 31 '19 18:10 lh3

In my use case, the reference is a 16S rRNA database (sequences all around 1.5Kb) and the reads are also from 16S rRNA sequencing (reads have been filtered to 1.4-1.6Kb). All cDNA.

For one or two of the samples, I was still hitting memory limits with 64Gb of memory.

Oct 31 '19 18:10 mbhall88

High memory usage is expected given the repetitiveness of the reference data.

Oct 31 '19 18:10 lh3

@mbhall88 you might want to check if your account is limited in RAM quote too:

% ulimit -a

data seg size           (kbytes, -d) unlimited
max locked memory       (kbytes, -l) unlimited
max memory size         (kbytes, -m) unlimited
etc

Oct 31 '19 20:10 tseemann

@mbhall88 are mapping to "pick" OTUs or classify (taxonomy) reads?

You may want to try using the feature-classifier within q2. You can train the classifier using information based on your specific sample preparation (e.g., primers used etc). More information here: https://docs.qiime2.org/2019.7/data-resources/

Nov 01 '19 00:11 andrem01

High memory usage is expected, given the repetitiveness of the reference data.

Another example. I have a sample that is (compressed) 311MB in size, mapping to the database I previously mentioned, and 150Gb RAM is still not enough (same for 10s of other samples of the same size). This is the stderr I get

[M::mm_idx_gen::30.581*1.57] collected minimizers
[M::mm_idx_gen::33.309*2.16] sorted minimizers
[M::main::33.486*2.15] loaded/built the index for 695171 target sequence(s)
[M::mm_mapopt_update::33.701*2.14] mid_occ = 16535
[M::mm_idx_stat] kmer size: 15; skip: 10; is_hpc: 0; #seq: 695171
[M::mm_idx_stat::33.842*2.14] distinct minimizers: 7165551 (60.20% are singletons); average occurrences: 25.429; average spacing: 5.537

@lh3 if you are saying that this high memory is expected then that's fine, but I think you should clarify this in the documentation because that is crazy high memory usage for mapping a small sample to a seemingly small reference database.

Nov 01 '19 11:11 mbhall88

[M::mm_mapopt_update::33.701*2.14] mid_occ = 16535

The repetitiveness is really high here. The peak memory will depend on what sequences are being mapped at the same time across threads. It can vary a lot. One way is to reduce the number of threads.

Nov 01 '19 12:11 lh3

We're seeing something similar with one of our PhD students too. She's aligning a ~1.6 GB FASTQ to a set of 16S references (n=~21000, 50Mb). The job slowly but surely consumes the 64 GB of RAM on her machine and then gets killed by the OOM killer. We've overcome it temporarily by setting the minibatch size to something lower than the 500M default with -K. I tried to diagnose by running minimap2 on one thread, the strace of the child alignment process shows it's just running mprotect and mmap until all the RAM is consumed.

Nov 04 '19 11:11 SamStudio8

The repetitiveness is really high here. The peak memory will depend on what sequences are being mapped at the same time across threads. It can vary a lot. One way is to reduce the number of threads.

I tried running with a single thread and all 20 samples still use more than 150Gb memory.... @

Nov 04 '19 14:11 mbhall88

We're seeing something similar with one of our PhD students too. She's aligning a ~1.6 GB FASTQ to a set of 16S references (n=~21000, 50Mb). The job slowly but surely consumes the 64 GB of RAM on her machine and then gets killed by the OOM killer. We've overcome it temporarily by setting the minibatch size to something lower than the 500M default with -K. I tried to diagnose by running minimap2 on one thread, the strace of the child alignment process shows it's just running mprotect and mmap until all the RAM is consumed.

I will give the minibatch approach a go. Thanks @SamStudio8

Nov 04 '19 14:11 mbhall88

Also, could you try --no-kalloc --print-qname? Thanks.

Nov 04 '19 14:11 lh3

I was able to get 20/22 samples to complete within 100Gb RAM with the following options

minimap2 -K 100M --no-kalloc --print-qname -aLx map-ont -t 1 <target> <query>

I am trying -K 50M to see if I can get the last two to finish.

Nov 05 '19 11:11 mbhall88

I think this is related to https://github.com/lh3/minigraph/issues/7.

I am trying to align two chromosomes which are both ~800Mb.

Testing with K100 now.

Nov 06 '19 10:11 fbemm

Crashed again with Segmentation fault (core dumped).

Nov 08 '19 11:11 fbemm

Ok, so to get my samples to all run in under 50Gb RAM I had to use the following options

minimap2 -t 1 -K 25M --no-kalloc --print-qname -aLx map-ont <target> <query>

Although I don't necessarily think the --print-qname option makes an impact on the memory footprint.

Nov 08 '19 17:11 mbhall88

I met the same error too. I want to do a alignment between two genome. The command is following: minimap2 -ax asm5 --cs -t 20 $ref $query > query2ref.sam

And the size of my genome is about 960 Mb.

I do not know how to address it. Is there anyone solve it.

Dec 26 '20 12:12 shehongbing

I met the same error too, when doing self mapping, the size of genome is 850M.

/minimap2 -t 4 -x asm5 -o Bch.Bch.paf Bch.genome.fa Bch.genome.fa

minimap2 v2.26-r1175

Best, Kun

Apr 27 '24 06:04 xiekunwhy

minimap2 minimap2 copied to clipboard

[morecore] insufficient memory

minimap2
minimap2 copied to clipboard