abPOA icon indicating copy to clipboard operation
abPOA copied to clipboard

Homopolymer indels not consistently aligned

Open rlorigro opened this issue 1 year ago • 6 comments

Hi, I am trying to get a reasonable alignment in a region which has some tandem repeats, flanked by non-repetitive sequence. I can get good (enough) results in the tandem region using these parameters:

abpoa \
-n 10 \
--progressive \
--amb-strand \
-b 1000 \
-r 1 \

However, in the (mostly non-repetitive) flanking region there is a long homopolymer, where I get this result:

TTCAAGACCAGCCTGGGCAACATAGTGAGACATTGTCTCTAC--------AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACACAAAATTAGTCGGGTGTGGTGGTGCC
TTCAAGACCAGTCTGGGCAACATAGTGAGACATTGTCTCTAC------------------AAAAAAAAAAAAAAAAAAAAAAAAAACACAAAATTAGTCGGGTGTGGTGGTGCC
TTCAAGACCAGTCTGGGCAACATAGTGAGACATTGTCTCTAC------------------AAAAAAAAAAAAAAAAAAAAAAAAAACACAAAATTAGTCGGGTGTGGTGGTGCC
TTCAAGACCAGCCTGGGCAACATAGTGAGACATTGTCTCTACA-------AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACACAAAATTAGTCGGGTGTGGTGGTGCC
TTCAAGACCAGCCTGGGCAACATAGTGAGACATTGTCTCTAC--------AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACACAAAATTAGTCGGGTGTGGTGGTGCC
TTCAAGACCAGCCTGGGCAACATAGTGAGACATTGTCTCTAC--------AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACACAAAATTAGTCGGGTGTGGTGGTGCC
TTCAAGACCAGCCTGGGCAACATAGTGAGACATTGTCTCTAC---------AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACACAAAATTAGTCGGGTGTGGTGGTGCC
TTCAAGACCAGCCTGGGCAACATAGTGAGACATTGTCTCTAC---------AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACACAAAATTAGTCGGGTGTGGTGGTGCC
TTCAAGACCAGCCTGGGCAACATAGTGAGACATTGTCTCTAC------AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACACAAAATTAGTCGGGTGTGGTGGTGCC
TTCAAGACCAGCCTGGGCAACATAGTGAGACATTGTCTCTACA-------AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACACAAAATTAGTCGGGTGTGGTGGTGCC
TTCAAGACCAGCCTGGGCAACATAGTGAGACATTGTCTCTACA-------AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACACAAAATTAGTCGGGTGTGGTGGTGCC
TTCAAGACCAGCCTGGGCAACATAGTGAGACATTGTCTCTAC--------AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACACAAAATTAGTCGGGTGTGGTGGTGCC
TTCAAGACCAGCCTGGGCAACATAGTGAGACATTGTCTCTAC--------AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACACAAAATTAGTCGGGTGTGGTGGTGCC
TTCAAGACCAGCCTGGGCAACATAGTGAGACATTGTCTCTAC--------AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACACAAAATTAGTCGGGTGTGGTGGTGCC
TTCAAGACCAGCCTGGGCAACATAGTGAGACATTGTCTCTAC--------AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACACAAAATTAGTCGGGTGTGGTGGTGCC
TTCAAGACCAGCCTGGGCAACATAGTGAGACATTGTCTCTAC----------------AAAAAAAAAAAAAAAAAAAAAAAAAAAACACAAAATTAGTCGGGTGTGGTGGTGCC
TTCAAGACCAGCCTGGGCAACATAGTGAGACATTGTCTCTAC---------AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACACAAAATTAGTCGGGTGTGGTGGTGCC
TTCAAGACCAGCCTGGGCAACATAGTGAGACATTGTCTCTAC---------AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACACAAAATTAGTCGGGTGTGGTGGTGCC
TTCAAGACCAGCCTGGGCAACATAGTGAGACATTGTCTCTAC--------------AAAAAAAAAAAAAAAAAAAAAAAAAAAAAACACAAAATTAGTCGGGTGTGGTGGTGCC
TTCAAGACCAGCCTGGGCAACATAGTGAGACATTGTCTCTAC-AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACACAAAATTAGTCGGGTGTGGTGGTGCC
TTCAAGACCAGCCTGGGCAACATAGTGAGACATTGTCTCTAC----AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACACAAAATTAGTCGGGTGTGGTGGTGCC
TTCAAGACCAGCCTGGGCAACATAGTGAGACATTGTCTCTAC--------AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACACAAAATTAGTCGGGTGTGGTGGTGCC
TTCAAGACCAGCCTGGGCAACATAGTGAGACATTGTCTCTAC--------AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACACAAAATTAGTCGGGTGTGGTGGTGCC
TTCAAGACCAGCCTGGGCAACATAGTGAGACATTGTCTCTAC---------AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACACAAAATTAGTCGGGTGTGGTGGTGCC
TTCAAGACCAGCCTGGGCAACATAGTGAGACATTGTCTCTAC---------AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACACAAAATTAGTCGGGTGTGGTGGTGCC
TTCAAGACCAGCCTGGGCAACATAGTGAGACATTGTCTCTAC---------AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACACAAAATTAGTCGGGTGTGGTGGTGCC
TTCAAGACCAGCCTGGGCAACATAGTGAGACATTGTCTCTAC---------AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACACAAAATTAGTCGGGTGTGGTGGTGCC
TTCAAGACCAGCCTGGGCAACATAGTGAGACATTGTCTCTAC--------------AAAAAAAAAAAAAAAAAAAAAAAAAAAAAACACAAAATTAGTCGGGTGTGGTGGTGCC
TTCAAGACCAGCCTGGGCAACATAGTGAGACATTGTCTCTAC---------------AAAAAAAAAAAAAAAAAAAAAAAAAAAAACACAAAATTAGTCGGGTGTGGTGGTGCC
TTCAAGACCAGCCTGGGCAACATAGTGAGACATTGTCTCTAC---------------AAAAAAAAAAAAAAAAAAAAAAAAAAAAACACAAAATTAGTCGGGTGTGGTGGTGCC
TTCAAGACCAGCCTGGGCAACATAGTGAGACATTGTCTCTAC----------------AAAAAAAAAAAAAAAAAAAAAAAAAAAACACAAAATTAGTCGGGTGTGGTGGTGCC
TTCAAGACCAGCCTGGGCAACATAGTGAGACATTGTCTCTA-------------------AAAAAAAAAAAAAAAAAAAAAAAAAACACAAAATTAGTCGGGTGTGGTGGTGCC
TTCAAGACCAGCCTGGGCAACATAGTGAGACATTGTCTCTA-------------------AAAAAAAAAAAAAAAAAAAAAAAAAACACAAAATTAGTCAGGTGTGGTGGTGCC
TTCAAGACCAGCCTGGGCAACATAGTGAGACATTGTCTCTA-------------------AAAAAAAAAAAAAAAAAAAAAAAAAACACAAAATTAGTCAGGTGTGGTGGTGCC
TTCAAGACCAGCCTGGGCAACATAGTGAGACATTGTCTCTA-------------------AAAAAAAAAAAAAAAAAAAAAAAAAACACAAAATTAGTCGGGTGTGGTGGTGCC
TTCAAGACCAGCCTGGGCAACATAGTGAGACATTGTCTCTA-------------------AAAAAAAAAAAAAAAAAAAAAAAAAACACAAAATTAGTCAGGTGTGGTGGTGCC
TTCAAGACCAGCCTGGGCAACATAGTGAGACATTGTCTCTA-------------------AAAAAAAAAAAAAAAAAAAAAAAAAACACAAAATTAGTCGGGTGTGGTGGTGCC
TTCAAGACCAGCCTGGGCAACATAGTGAGACATTGTCTCT--------------------AAAAAAAAAAAAAAAAAAAAAAAAAACACAAAATTAGTCAGGTGTGGTGGTGCC
TTCAAGACCAGCCTGGGCAACATAGTGAGACATTGTCTCT--------------------AAAAAAAAAAAAAAAAAAAAAAAAAACACAAAATTAGTCGGGTGTGGTGGTGCC
TTCAAGACCAGCCTGGGCAACATAGTGAGACATTGTCTCT--------------------AAAAAAAAAAAAAAAAAAAAAAAAAACACAAAATTAGTCGGGTGTGGTGGTGCC
TTCAAGACCAGCCTGGGCAACATAGTGAGACATTGTCTCT--------------------AAAAAAAAAAAAAAAAAAAAAAAAAACACAAAATTAGTCGGGTGTGGTGGTGCC
TTCAAGACCAGCCTGGGCAACATAGTGAGACATTGTCTCT--------------------AAAAAAAAAAAAAAAAAAAAAAAAAACACAAAATTAGTCGGGTGTGGTGGTGCC
TTCAAGACCAGCCTGGGCAACATAGTGAGACATTGTCTCTA-------------------AAAAAAAAAAAAAAAAAAAAAAAAAACACAAAATTAGTCGGGTGTGGTGGTGCC
TTCAAGACCAGCCTGGGCAACATAGTGAGACATTGTCTCTA-------------------AAAAAAAAAAAAAAAAAAAAAAAAAACACAAAATTAGTCGGGTGTGGTGGTGCC
TTCAAGACCAGCCTGGGCAACATAGTGAGACATTGTCTCTA-------------------AAAAAAAAAAAAAAAAAAAAAAAAAACACAAAATTAGTCGGGTGTGGTGGTGCC
TTCAAGACCAGCCTGGGCAACATAGTGAGACATTGTCTCT--------------------AAAAAAAAAAAAAAAAAAAAAAAAAACACAAAATTAGTCGGGTGTGGTGGTGCC
TTCAAGACCAGCCTGGGCAACATAGTGAGACATTGTCTCTAC----------------------AAAAAAAAAAAAAAAAAAAAAACACAAAATTAGTCGGGTGTGGTGGTGCC
TTCAAGACCAGCCTGGGCAACATAGTGAGACATTGTCTCTAC----------------------AAAAAAAAAAAAAAAAAAAAAACACAAAATTAGTCGGGTGTGGTGGTGCC
TTCAAGACCAGCCTGGGCAACATAGTGAGACATTGTCTCTAC----------------------AAAAAAAAAAAAAAAAAAAAAACACAAAATTAGTCGGGTGTGGTGGTGCC
TTCAAGACCAGCCTGGGCAACATAGTGAGACATTGTCTCT--------------------ACAAAAAAAAAAAAAAAAAAAAAAAACACAAAATTAGTCGGGTGTGGTGGTGCC
TTCAAGACCAGCCTGGGCAACATAGTGAGACATTGTCTCT--------------------ACAAAAAAAAAAAAAAAAAAAAAAAACACAAAATTAGTCGGGTGTGGTGGTGCC
TTCAAGACCAGCCTGGGCAACATAGTGAGACATTGTCTCT--------------------ACAAAAAAAAAAAAAAAAAAAAAAAACACAAAATTAGTCGGGTGTGGTGGTGCC
TTCAAGACCAGCCTGGGCAACATAGTGAGACATTGTCTCT--------------------AC-AAAAAAAAAAAAAAAAAAAAAAACACAAAATTAGTCGGGTGTGGTGGTGCC
TTCAAGACCAGCCTGGGCAACATAGTGAGACATTGTCTCT--------------------AC-AAAAAAAAAAAAAAAAAAAAAAACACAAAATTAGTCGGGTGTGGTGGTGCC
TTCAAGACCAGCCTGGGCAACATAGTGAGACATTGTCTCT--------------------AC-AAAAAAAAAAAAAAAAAAAAAAACACAAAATTAGTCGGGTGTGGTGGTGCC
TTCAAGACCAGCCTGGGCAACATAGTGAGACATTGTCTCT--------------------AC-AAAAAAAAAAAAAAAAAAAAAAACACAAAATTAGTCGGGTGTGGTGGTGCC
TTCAAGACCAGCCTGGGCAACATAGTGAGACATTGTCTCT--------------------AC-AAAAAAAAAAAAAAAAAAAAAAACACAAAATTAGTCGGGTGTGGTGGTGCC
TTCAAGACCAGCCTGGGCAACATAGTGAGACATTGTCTCT--------------------AC-AAAAAAAAAAAAAAAAAAAAAAACACAAAATTAGTCGGGTGTGGTGGTGCC
TTCAAGACCAGCCTGGGCAACATAGTGAGACATTGTCTCT--------------------AC-AAAAAAAAAAAAAAAAAAAAAAACACAAAATTAGTCGGGTGTGGTGGTGCC
TTCAAGACCAGCCTGGGCAACATAGTGAGACATTGTCTCT--------------------AC-AAAAAAAAAAAAAAAAAAAAAAACACAAAATTAGTCGGGTGTGGTGGTGCC
TTCAAGACCAGCCTGGGCAACATAGTGAGACATTGTCTCT--------------------AC-AAAAAAAAAAAAAAAAAAAAAAACACAAAATTAGTCGGGTGTGGTGGTGCC
TTCAAGACCAGCCTGGGCAACATAGTGAGACATTGTCTCT--------------------AC-AAAAAAAAAAAAAAAAAAAAAAACACAAAATTAGTCGGGTGTGGTGGTGCC
TTCAAGACCAGCCTGGGCAACATAGTGAGACATTGTCTCT--------------------AC-AAAAAAAAAAAAAAAAAAAAAAACACAAAATTAGTCGGGTGTGGTGGTGCC
TTCAAGACCAGCCTGGGCAACATAGTGAGACATTGTCTCT--------------------AC-AAAAAAAAAAAAAAAAAAAAAAACACAAAATTAGTCGGGTGTGGTGGTGCC
TTCAAGACCAGCCTGGGCAACATAGTGAGACATTGTCTCT--------------------AC-AAAAAAAAAAAAAAAAAAAAAAACACAAAATTAGTCGGGTGTGGTGGTGCC
TTCAAGACCAGCCTGGGCAACATAGTGAGACATTGTCTCT--------------------AC-AAAAAAAAAAAAAAAAAAAAAAACACAAAATTAGTCGGGTGTGGTGGTGCC
TTCAAGACCAGCCTGGGCAACATAGTGAGACATTGTCTCTAC----------------------AAAAAAAAAAAAAAAAAAAAAACACAAAATTAGTCGGGTGTGGTGGTGCC
TTCAAGACCAGCCTGGGCAACATAGTGAGACATTGTCTCTAC----------------------AAAAAAAAAAAAAAAAAAAAAACACAAAATTAGTCGGGTGTGGTGGTGCC
TTCAAGACCAGCCTGGGCAACATAGTGAGACATTGTCTCTAC----------------------AAAAAAAAAAAAAAAAAAAAAACACAAAATTAGTCGGGTGTGGTGGTGCC
TTCAAGACCAGCCTGGGCAACATAGTGAGACATTGTCTCTAC----------------------AAAAAAAAAAAAAAAAAAAAAACACAAAATTAGTCGGGTGTGGTGGTGCC
TTCAAGACCAGCCTGGGCAACATAGTGAGACATTGTCTCTAC----------------------AAAAAAAAAAAAAAAAAAAAAACACAAAATTAGTCGGGTGTGGTGGTGCC
TTCAAGACCAGCCTGGGCAACATAGTGAGACATTGTCTCTAC----------------------AAAAAAAAAAAAAAAAAAAAAACACAAAATTAGTCGGGTGTGGTGGTGCC
TTCAAGACCAGCCTGGGCAACATAGTGAGACATTGTCTCTAC----------------------AAAAAAAAAAAAAAAAAAAAAACACAAAATTAGTCGGGTGTGGTGGTGCC
TTCAAGACCAGCCTGGGCAACATAGTGAGACATTGTCTCTAC----------------------AAAAAAAAAAAAAAAAAAAAAACACAAAATTAGTCGGGTGTGGTGGTGCC
TTCAAGACCAGCCTGGGCAACATAGTGAGACATTGTCTCTAC----------------------AAAAAAAAAAAAAAAAAAAAAACACAAAATTAGTCGGGTGTGGTGGTGCC
TTCAAGACCAGCCTGGGCAACATAGTGAGACATTGTCTCTAC------------------------AAAAAAAAAAAAAAAAAAAACACAAAATTAGTCGGGTGTGGTGGTGCC
TTCAAGACCAGCCTGGGCAACATAGTGAGACATTGTCTCTAC----------------------AAAAAAAAAAAAAAAAAAAAAACACAAAATTAGTCGGGTGTGGTGGTGCC
TTCAAGACCAGCCTGGGCAACATAGTGAGACATTGTCTCTAC------------------------AAAAAAAAAAAAAAAAAAAACACAAAATTAGTCGGGTGTGGTGGTGCC

Where it seems to arbitrarily assign different paths to the same AC prefix. Do you think this can be resolved with parameter choices or is this an unavoidable aspect of POA?

Thanks

rlorigro avatar May 31 '23 19:05 rlorigro