bowtie2 icon indicating copy to clipboard operation
bowtie2 copied to clipboard

Guidance on making changes to mismatch tolerance

Open bmorledge-hampton19 opened this issue 2 years ago • 0 comments

I'm hoping to fork bowtie2 and make some changes to how mismatches are tolerated in upfront read alignment and potentially seeding. I am trying to align a set of relatively short reads (between 23 and 31 base pairs, inclusive) with known tandem mismatches. Since seeding cannot tolerate more than a single mismatch and these mismatches may be present near the middle of the read, I am currently forced to use short and inefficient seed lengths to align these reads.

However, since it is known that these tandem mismatches are adjacent to one another, it seems like it should be computationally feasible to allow this special case in seeding or upfront alignment. I'm happy to try and implement this special case myself, but I'm not sure where to begin searching the source code for the relevant algorithms that will need to be changed. Is there anyone with more familiarity with the source that would be willing to point me in the right direction?

To reiterate, I am hoping to update the upfront read alignment that is mentioned in documentation in relation to the --no-1mm-upfront flag so that it is tolerant of 2 mismatches, given that they are adjacent. Since I expect many of my reads to be devoid of additional mismatches and indels, this should be sufficient for my purposes. I am also interested in making similar changes to the seeding process, but this is lower priority.

Thanks in advance for your help!

bmorledge-hampton19 avatar May 02 '22 18:05 bmorledge-hampton19