minimap2 icon indicating copy to clipboard operation
minimap2 copied to clipboard

lchain.c:59: mg_chain_backtrack: Assertion `n_v < (2147483647)' failed

Open dirkjanvw opened this issue 3 years ago • 15 comments

Hi, I am trying to run minimap2 as part of purge_haplotigs, but minimap2 gives me an error which I think is caused by minimap2 and not by purge_haplotigs:

$ cat tmp_purge_haplotigs/STDERR/minimap2.stderr 
[M::main::11.424*1.00] loaded/built the index for 164466 target sequence(s)
[M::mm_mapopt_update::13.160*1.00] mid_occ = 610
[M::mm_idx_stat] kmer size: 15; skip: 10; is_hpc: 0; #seq: 164466
[M::mm_idx_stat::14.168*1.00] distinct minimizers: 71508996 (36.40% are singletons); average occurrences: 8.176; average spacing: 6.864; total length: 4013351444
[M::worker_pipeline::26735.810*1.08] mapped 28358 sequences
minimap2: lchain.c:59: mg_chain_backtrack: Assertion `n_v < (2147483647)' failed.
Aborted (core dumped)

purge_haplotigs called minimap2 with the following parameters: minimap2 parameters: '-p 1e-5 -f 0.001 -N 100000'

I am using minimap2 version 2.20:

$ minimap2 --version
2.20-r1061

dirkjanvw avatar Jun 10 '21 06:06 dirkjanvw

This means minimap2 sees too many candidate chains. For a temporary remedy, you may add option -n5 (or even -n10) to increase the minimum chain length and/or -xasm20 to use longer and fewer k-mers.

Note that earlier versions of minimap2 has the same restriction but it may lead to a silent failure for that query sequence.

lh3 avatar Jun 10 '21 13:06 lh3

Thanks a lot for the possibilities to work around this! It took me a week to try them all out. I found that neither -n5 nor -n10 worked for my input (same error), but -xasm20 did work!

And indeed, I tried older versions of minimap2 as well and they simply crashed without the assertion error.

dirkjanvw avatar Jun 17 '21 12:06 dirkjanvw

Thanks for the confirmation. I will leave this issue open in case I may find a more robust solution in future.

lh3 avatar Jun 28 '21 03:06 lh3

Hi, I have been having the same issue, but neither -n5, -n10 or -xasm20 have helped. Could it be due to very large chromosomes. Some of the chromosomes exceed 2Gb in length.

All the best, Agnieszka

agolicz avatar Oct 25 '21 09:10 agolicz

No plan to support long chromosomes, unfortunately.

lh3 avatar Oct 25 '21 20:10 lh3

Thanks! I assume 2147483647 is the upper limit? Mapping short contigs to long chromosomes works fine, but aligning two long chromosomes fails. I can just split them to avoid hitting the limit. What maximum size would you recommend?

agolicz avatar Oct 25 '21 21:10 agolicz

2147483647 is the limit

lh3 avatar Oct 25 '21 21:10 lh3

What is "long" in this context? I have contigs that are about a gigabase in size (longest are 1.25 GB) and am running into the same error:

minimap2: lchain.c:73: mg_chain_backtrack: Assertion `n_v < (2147483647)' failed

.... Is this expected behavior? How much do you think I'd need to chop up my contigs in order to get minimap2 to run?

EDIT: To be clear, I'm attempting to align two different reference genomes for closely related organisms. The longest sequence in each fasta is about 1.25 gigabases. Is n_v the sum of the length of the two sequences that it is trying to backtrack through?

nhartwic avatar Feb 27 '22 20:02 nhartwic

I think it is the sum of the two sequences. @nhartwic did you figure out and chose aligner yet?

laxmangene7 avatar Mar 17 '22 03:03 laxmangene7

I split the contigs in my one of my fasta files such that the largest contigs were around 300 MB. I then used those split contigs and the other fasta file and minimap2 was able to run so I just used those results. I didn't attempt to optimize the split size once it worked. I just needed some quick dotplots.

nhartwic avatar Mar 17 '22 06:03 nhartwic

@nhartwic Thanks! how did you split?

laxmangene7 avatar Mar 17 '22 06:03 laxmangene7

Honestly, I just did it in a python session. Code looked something like...

from Bio import SeqIO

fout = open('chunked.fasta', 'w')

for record in SeqIO.parse("raw.fasta", "fasta"):
    seq = record.seq
    ctg = record.id
    init_cs = 3 * 10**8
    nchunks = int(len(seq) / init_cs) + 1
    cs = int( len(seq) / nchunks) + 1
    for i in range(0, nchunks):
        s = i * cs
        e = (i+1) * cs
        hout.write(f'>{ctg}.{i}\n{seq[s:e]}\n')

nhartwic avatar Mar 17 '22 06:03 nhartwic

@nhartwic thank you again. So my assembly should be raw.fasta or chunked.fasta?

laxmangene7 avatar Mar 17 '22 06:03 laxmangene7

"chunked.fasta" is where its writing output to. "raw.fasta" is where its reading from. Those are just strings though, you can point them wherever. And the code I wrote was just off the cuff again. It probably includes errors. For example, that last line should say 'fout.write' not 'hout.write'. Typo.

nhartwic avatar Mar 17 '22 06:03 nhartwic

Please check the previous conversation:

This means minimap2 sees too many candidate chains. For a temporary remedy, you may add option -n5 (or even -n10) to increase the minimum chain length and/or -xasm20 to use longer and fewer k-mers.

I assume 2147483647 is the upper limit?

2147483647 is the limit

lh3 avatar Mar 17 '22 13:03 lh3