minimap2 icon indicating copy to clipboard operation
minimap2 copied to clipboard

split-prefix and write_sam_cigar problems

Open evgenyleushkin opened this issue 1 year ago • 0 comments

Dear Hang Li and all,

After successfully using minimap2 on many occasions, I recently encountered an issue when using an assembly file, produced by hifiasm (v.0.18). Attempting to map HiFi reads to an assembly resulted in WARNING:

"For a multi-part index, no @SQ lines will be outputted. Please use --split-prefix"

with empty bam-file after ~3 minutes of runtime (after collecting and sorting minimizers). When using --split-prefix option after a series of mapping lines e.g.

"[M::worker_pipeline::22496.886*78.16] mapped 40005 sequences"

I again received an error message:

"minimap2: format.c:380: write_sam_cigar: Assertion `clip_len[0] < qlen && clip_len[1] < qlen' failed."

No such message was encountered when using different assembly file.

Trying to dig deeper into this problem I've split an assembly file into several parts and discovered that it works properly (with or without --split-prefix) with first 10110 lines. However adding any other sequence (even completely external) causes the error again.

I can also provide the assembly file (2.2Gb) and the truncated version, which works, if needed.

In principle we could just discard those few short contigs, which cause problems (<0.1% of data), but for consistency and for future projects would be important for us to understand why this very peculiar issue arises in a first place.

Best,

Evgeny Leushkin

evgenyleushkin avatar May 02 '23 11:05 evgenyleushkin