minimap2
minimap2 copied to clipboard
lchain.c:59: mg_chain_backtrack: Assertion `n_v < (2147483647)' failed
Hi, I am trying to run minimap2 as part of purge_haplotigs, but minimap2 gives me an error which I think is caused by minimap2 and not by purge_haplotigs:
$ cat tmp_purge_haplotigs/STDERR/minimap2.stderr
[M::main::11.424*1.00] loaded/built the index for 164466 target sequence(s)
[M::mm_mapopt_update::13.160*1.00] mid_occ = 610
[M::mm_idx_stat] kmer size: 15; skip: 10; is_hpc: 0; #seq: 164466
[M::mm_idx_stat::14.168*1.00] distinct minimizers: 71508996 (36.40% are singletons); average occurrences: 8.176; average spacing: 6.864; total length: 4013351444
[M::worker_pipeline::26735.810*1.08] mapped 28358 sequences
minimap2: lchain.c:59: mg_chain_backtrack: Assertion `n_v < (2147483647)' failed.
Aborted (core dumped)
purge_haplotigs called minimap2 with the following parameters:
minimap2 parameters: '-p 1e-5 -f 0.001 -N 100000'
I am using minimap2 version 2.20:
$ minimap2 --version
2.20-r1061
This means minimap2 sees too many candidate chains. For a temporary remedy, you may add option -n5
(or even -n10
) to increase the minimum chain length and/or -xasm20
to use longer and fewer k-mers.
Note that earlier versions of minimap2 has the same restriction but it may lead to a silent failure for that query sequence.
Thanks a lot for the possibilities to work around this! It took me a week to try them all out. I found that neither -n5
nor -n10
worked for my input (same error), but -xasm20
did work!
And indeed, I tried older versions of minimap2 as well and they simply crashed without the assertion error.
Thanks for the confirmation. I will leave this issue open in case I may find a more robust solution in future.
Hi, I have been having the same issue, but neither -n5, -n10 or -xasm20 have helped. Could it be due to very large chromosomes. Some of the chromosomes exceed 2Gb in length.
All the best, Agnieszka
No plan to support long chromosomes, unfortunately.
Thanks! I assume 2147483647 is the upper limit? Mapping short contigs to long chromosomes works fine, but aligning two long chromosomes fails. I can just split them to avoid hitting the limit. What maximum size would you recommend?
2147483647 is the limit
What is "long" in this context? I have contigs that are about a gigabase in size (longest are 1.25 GB) and am running into the same error:
minimap2: lchain.c:73: mg_chain_backtrack: Assertion `n_v < (2147483647)' failed
.... Is this expected behavior? How much do you think I'd need to chop up my contigs in order to get minimap2 to run?
EDIT: To be clear, I'm attempting to align two different reference genomes for closely related organisms. The longest sequence in each fasta is about 1.25 gigabases. Is n_v the sum of the length of the two sequences that it is trying to backtrack through?
I think it is the sum of the two sequences. @nhartwic did you figure out and chose aligner yet?
I split the contigs in my one of my fasta files such that the largest contigs were around 300 MB. I then used those split contigs and the other fasta file and minimap2 was able to run so I just used those results. I didn't attempt to optimize the split size once it worked. I just needed some quick dotplots.
@nhartwic Thanks! how did you split?
Honestly, I just did it in a python session. Code looked something like...
from Bio import SeqIO
fout = open('chunked.fasta', 'w')
for record in SeqIO.parse("raw.fasta", "fasta"):
seq = record.seq
ctg = record.id
init_cs = 3 * 10**8
nchunks = int(len(seq) / init_cs) + 1
cs = int( len(seq) / nchunks) + 1
for i in range(0, nchunks):
s = i * cs
e = (i+1) * cs
hout.write(f'>{ctg}.{i}\n{seq[s:e]}\n')
@nhartwic thank you again. So my assembly should be raw.fasta or chunked.fasta?
"chunked.fasta" is where its writing output to. "raw.fasta" is where its reading from. Those are just strings though, you can point them wherever. And the code I wrote was just off the cuff again. It probably includes errors. For example, that last line should say 'fout.write' not 'hout.write'. Typo.
Please check the previous conversation:
This means minimap2 sees too many candidate chains. For a temporary remedy, you may add option -n5 (or even -n10) to increase the minimum chain length and/or -xasm20 to use longer and fewer k-mers.
I assume 2147483647 is the upper limit?
2147483647 is the limit