minimap2
minimap2 copied to clipboard
API: Understanding why example.c results differ from commandline results
Hi,
I am attempting to use your example.c as a template for aligning some assembly haplotypes: https://github.com/rlorigro/GFAse/blob/0ce1fb2990bce9b0f4ab27dade77ff1263402deb/src/test/test_minimap2.cpp#L39
Target: https://github.com/rlorigro/GFAse/raw/main/data/long_challenge_node.fasta
Query: https://github.com/rlorigro/GFAse/raw/main/data/short_challenge_nodes.fasta
My adaptation of example.c tries to apply the the asm20
preset, and then align the queries, and print the results, but it produces no cigar output, and doesn't appear to properly locate minimizers.
The verbose output differs from the output of running the commandline mapping most notably in the mid_occ
output:
API (example.c)
k=19
min_mid_occ=50
max_mid_occ=500
[M::mm_idx_gen::1659998794.654*0.00] collected minimizers
[M::mm_idx_gen::1659998795.974*0.00] sorted minimizers
[M::mm_mapopt_update::1659998795.974*0.00] mid_occ = 1019154007
Killed
this implementation often, but not always, runs out of memory, so the output is truncated.
Minimap2 commandline:
$ minimap2 -x asm20 --secondary=no --eqx -c long_challenge_node.fasta short_challenge_nodes.fasta > test.paf
[M::mm_idx_gen::1.749*1.00] collected minimizers
[M::mm_idx_gen::2.015*1.26] sorted minimizers
[M::main::2.015*1.26] loaded/built the index for 1 target sequence(s)
[M::mm_mapopt_update::2.116*1.25] mid_occ = 130
[M::mm_idx_stat] kmer size: 19; skip: 10; is_hpc: 0; #seq: 1
[M::mm_idx_stat::2.185*1.24] distinct minimizers: 11860200 (96.13% are singletons); average occurrences: 1.248; average spacing: 5.497; total length: 81336785
[M::worker_pipeline::77.038*1.64] mapped 2 sequences
[M::main] Version: 2.23-r1111
[M::main] CMD: minimap2 -x asm20 --secondary=no --eqx -c long_challenge_node.fasta short_challenge_nodes.fasta
[M::main] Real time: 77.049 sec; CPU: 126.467 sec; Peak RSS: 3.813 GB
Am I missing a step for indexing? Why is the example.c implementation giving different results?