Semi-global alignment ratio?

Open jianshu93 opened this issue 1 year ago • 0 comments

Hello Team,

I have long reads from PacBio (1k to 15k), I want to align them in some pairs and they can be highly similar, e.g., > 90% sequence identity but only for the overlapped region, meaning, 2 sequences might only have 50% aligned, and the aligned region is very high identity, which means I need semi-global alignment, I need to know the aligned length (so that I can calculate the alignment ratio). It would be nice if I can also detect a little bit below 90% identity, e.g., 85%. I do not recall where block aligner can do semi-global alignment, that is the gaps opened at both ends will not be penalized (or a very small penalty score like in vsearch --all_pairs_global, but those low penalty gaps will not be in the final alignment). Also read the adaptive banded aligner paper, still not sure how to compute the aligned ratio (e.g., aligned positions/query length). I guess both adaptive banded DB and block aligner can be adapted to compute the semi-global alignment alignment ratio and identity for the aligned region? This is what needed in real world application. Any suggestions? Let me know if I am not clear.

Thanks,

Jianshu

May 01 '24 16:05 jianshu93