minimap2 icon indicating copy to clipboard operation
minimap2 copied to clipboard

An option to output SEQ field for secondary alignment

Open fenderglass opened this issue 3 years ago • 4 comments

Hi!

I have been using minimap2 as a part of the Flye pipeline. I am using secondary alignments during consensus/polishing to account for possible duplications in disjointigs and improve the base quality of long repeats. Storing SEQ field for secondary alignments in SAM/BAM files makes the alignment file parsing much easier, since all separate alignments could be processed independently.

An existing -Y option solves the problem, but it also forces supplementary alignments to use soft clipping, which in some datasets dramatically increases alignment size. I therefore added a new option -secondary-seq that enables output of SEQ for secondary alignments, and uses hard clipping for both supplementary and secondary alignments.

The option was extensively tested as a part of the Flye pipeline for almost a year. I however have not tested outside of genome assembly setting.

Best, Mikhail

fenderglass avatar Nov 25 '20 20:11 fenderglass

It would be really great if this patch could be applied to enable building flye smoothly.

tillea avatar May 05 '22 08:05 tillea

Hi all,

Will be happy to make this pull up-to-date with the master, once you confirm that you are interested in adopting. The conflict is due to new command line options, so it is an easy fix.

fenderglass avatar May 05 '22 17:05 fenderglass

@fenderglass, I don't know how the pull works. But, I just posted the same request here. It would be great if sequences for secondary mapping can be turned on. Thanks!

olechnwin avatar May 19 '22 14:05 olechnwin

just saw this PR but I made a program that tries to help add SEQ to secondary alignments as a post-processing step though this PR would be interesting too! https://github.com/cmdcolin/secondary_rewriter

cmdcolin avatar Sep 14 '22 21:09 cmdcolin