strobealign
strobealign copied to clipboard
Map paired-end reads that overlap each other’s 5' ends
As has come up in #317 reported by @y9c, when mapping paired-end reads, we should allow for the case that reads overlap in this way:
R1 -------->
R2 <--------
- Previous discussion of how this can happen
- Issue #317 was about the case that R1 and R2 are reverse complements of each other (and should therefore be mapped to identical locations)
- @y9c pointed out the section in SAM specification about TLEN for this case
Strobealign currently allows these two situations:
- R1 is forward, R2 is reverse, and the leftmost mapped base of R1 is less than the leftmost mapped base of R2 (
R1---> <---R2
) - R2 is forward, R1 is reverse and the leftmost mapped base of R2 is less than the leftmost mapped base of R1 (
R2---> <---R1
)
(#317 changes the above to "... is less than or equal to ...")
It appears to me that to, for the first situation, we just need to change this to "... leftmost mapped base of R1 is less than the rightmost mapped base of R2" and similar for the second situation.
When judging whether seeds (not reads) are proper pairs, the above doesn’t quite work because we don’t know what the leftmost or rightmost mapped bases are going to be, mainly because we don’t know how many bases are going to be soft clipped on either side.
In a situation like this (====
show seed locations), the seeds would not overlap at all, but the reads could:
R1 -------------------->
====
R2 <------------------------
====
To ensure we don’t mistakenly rule out a pair, we can assume that the alignment extends ungapped to either end of the read. (I believe this is already done for the 5' end at the moment.)
I agree your proposed solutions.