ksw2 icon indicating copy to clipboard operation
ksw2 copied to clipboard

Potential feature request : Fitting / infix alignment

Open rob-p opened this issue 5 years ago • 2 comments

Hi @lh3,

It would be very useful to us to have support in ksw2 for "infix" or "fitting" alignment, where they query aligns within the target, but the precise starting position of the query is not known. This comes up for us when using ksw2 to score alignments of short reads in cases where there is an indel near the beginning of the read. Since extension alignment assumes that the starting position of the alignment is known (which is incorrect in the case where an insertion occurs in the beginning of the read before the first seed position), it ends up computing highly sub-optimal scores in these cases. The alignments, themselves, are obviously optimal given the assumption on the starting position, however, being able to "pad" the target sequence and let the alignment call figure out the best position to start the alignment at would be very useful in such cases. I understand if this is too much work, or if this functionality is completely extraneous to you --- in that case, please feel free to just let me know and close this feature request. I was also wondering if you encounter such cases in minimap2 & bwa-mem, and, if so, how you handle such alignments.

Thanks! Rob

rob-p avatar Jan 07 '19 16:01 rob-p

Such a feature would be a big help also for me. Actually, I could use even a more general one: a way to specify where the gaps in one of the two sequences are expected and should not be penalized.

(I have only very basic understanding of the sequence alignment. Should I look for a different algorithm? If it's not too hard to modify ksw_gg or another ksw function to make the gap opening penalty dependent on position -- I'd also appreciate any hints).

wojdyr avatar Sep 12 '19 18:09 wojdyr

When I googled "infix", I found this issue. Then I realized that I have missed this question for 3 years. Sorry.

It is non-trivial to implement infix alignment with the Suzuki-Kasahara algorithm. Banded alignment wouldn't work well with infix, either. Minimap2 extends from seeds and it doesn't need to do infix alignment. Actually, infix alignment can be misleading in the presence of adapters, SVs or other types of chimera. I prefer to avoid this type of alignment.

lh3 avatar Feb 03 '22 03:02 lh3