minimap2 icon indicating copy to clipboard operation
minimap2 copied to clipboard

API: Understanding why example.c results differ from commandline results

Open rlorigro opened this issue 1 year ago • 0 comments

Hi,

I am attempting to use your example.c as a template for aligning some assembly haplotypes: https://github.com/rlorigro/GFAse/blob/0ce1fb2990bce9b0f4ab27dade77ff1263402deb/src/test/test_minimap2.cpp#L39

Target: https://github.com/rlorigro/GFAse/raw/main/data/long_challenge_node.fasta

Query: https://github.com/rlorigro/GFAse/raw/main/data/short_challenge_nodes.fasta

My adaptation of example.c tries to apply the the asm20 preset, and then align the queries, and print the results, but it produces no cigar output, and doesn't appear to properly locate minimizers.

The verbose output differs from the output of running the commandline mapping most notably in the mid_occ output:

API (example.c)

k=19
min_mid_occ=50
max_mid_occ=500
[M::mm_idx_gen::1659998794.654*0.00] collected minimizers
[M::mm_idx_gen::1659998795.974*0.00] sorted minimizers
[M::mm_mapopt_update::1659998795.974*0.00] mid_occ = 1019154007
Killed

this implementation often, but not always, runs out of memory, so the output is truncated.

Minimap2 commandline:

$ minimap2 -x asm20 --secondary=no --eqx -c long_challenge_node.fasta short_challenge_nodes.fasta > test.paf
[M::mm_idx_gen::1.749*1.00] collected minimizers
[M::mm_idx_gen::2.015*1.26] sorted minimizers
[M::main::2.015*1.26] loaded/built the index for 1 target sequence(s)
[M::mm_mapopt_update::2.116*1.25] mid_occ = 130
[M::mm_idx_stat] kmer size: 19; skip: 10; is_hpc: 0; #seq: 1
[M::mm_idx_stat::2.185*1.24] distinct minimizers: 11860200 (96.13% are singletons); average occurrences: 1.248; average spacing: 5.497; total length: 81336785
[M::worker_pipeline::77.038*1.64] mapped 2 sequences
[M::main] Version: 2.23-r1111
[M::main] CMD: minimap2 -x asm20 --secondary=no --eqx -c long_challenge_node.fasta short_challenge_nodes.fasta
[M::main] Real time: 77.049 sec; CPU: 126.467 sec; Peak RSS: 3.813 GB

Am I missing a step for indexing? Why is the example.c implementation giving different results?

rlorigro avatar Aug 08 '22 23:08 rlorigro