bowtie2 icon indicating copy to clipboard operation
bowtie2 copied to clipboard

In paired end mode, bowtie2 sometimes reports a mate alignment that does not meet the validity scoring threshold

Open barrierbank opened this issue 5 years ago • 2 comments

bowtie2 is reporting some alignments that do not meet the scoring threshold in paired-end mode. We see this in two manifestations:

  1. bowtie2 reports a concordant pair alignment where the AS:i value for one of the mates does not meet the scoring threshold (and so this is actually an invalid alignment).
  2. bowtie2 reports an alternative alignment score (XS:i) for a mate in a valid concordant pair alignment, but the XS:i value is below the scoring threshold for the mate.

Example of number 1: AS:i:44 in the first line; the read length is 76 and the scoring threshold should be 54

NDX550191_RUO:5:HFN53AFXY:1:11108:1346:3854       99      chr9      66806000          22        10S50M16S       =         66806109          181       TGTGTAGTACCCTTTGTTTTGATAGAGCAGTGTTAAAACACAGTTTCTGTGAAATCTGCAAGTGTTCATTTGCAGC    *    AS:i:44 XN:i:0    XM:i:7  XO:i:0  XG:i:0  NM:i:7    MD:Z:5A15T2G6T0C3T4G8   YS:i:66 YT:Z:CP
NDX550191_RUO:5:HFN53AFXY:1:11108:1346:3854       147     chr9    66806109          22        21S41M14S         =         66806000        -181      AAGAATTATCTTCATAGAAACACTAGACAGAAGCATTCTCAGAAACTCCTTTGTGATGTTTGTGTTCAATTCCCCG    *    AS:i:66 XS:i:88 XN:i:0  XM:i:2  XO:i:0    XG:i:0  NM:i:2  MD:Z:4C5G30     YS:i:44 YT:Z:CP

Example of number 2: XS:i:46 in the first line; the read length is 76 and the scoring threshold should be 54

NDX550191_RUO:5:HFN53AFXY:1:11101:1606:11415      83      chr3      146378152         39      66M10S    =       146378063         -155      ACATAAAACTTGCAGGTCTTAAAAAACAATTACACAAGTGGGGGAAAAGAAAGAAATCAAATGGCAGTACTACACA    *    AS:i:124        XS:i:46 XN:i:0  XM:i:1    XO:i:0  XG:i:0  NM:i:1    MD:Z:48A17      YS:i:150       YT:Z:CP
NDX550191_RUO:5:HFN53AFXY:1:11101:1606:11415       163     chr3      146378063         39      75M       =         146378152       155       CCATTAAACCAGCCCTACAAAAAATTCTCAAAGGAGTTCTAAACATGGAAACAAAAGGTTGGTACTAGCCATCAT     *     AS:i:150        XS:i:48 XN:i:0  XM:i:0    XO:i:0  XG:i:0  NM:i:0  MD:Z:75   YS:i:124     YT:Z:CP

Here is our command line: bowtie2 -f --very-sensitive-local -p 24 -x $bowtie2_hg19_index -1 ${output_file}_1_trimmed.fasta -2 ${output_file}_2_trimmed.fasta


Now here is my theory about what is happening. It looks like bowtie2 first finds a valid single-mate alignment and records its AS:i score as the best so far. It then tries to improve on that score. It finds a valid single-mate alignment and uses that as an anchor to look for a concordant pair alignment, however it only requires the total score (sum of both mates' AS:i) for that alignment to exceed the previous best score it found, which was for a single-mate alignment. So, it can find a "concordant" pair alignment with a total score that improves on the previous single-mate alignment, but the alignment scores of the individual mates might not be high enough to make each mate alignment individually a valid alignment.

I have two pieces of evidence to support this:

  1. In every bad pair alignment that was reported, at least one of the pairs had an XS:i tag. This could be referring to the valid single-mate alignment that was then "improved on" by the invalid pair alignment.
  2. It appears to be consistent with the bowtie2 code, SwDriver::extendSeedsPaired (bowtie2-2.3.5 source code, aligner_sw_driver.cpp, line 1967):
// Anchor mate must have score at least 'ps' minus the best possible
// score for the opposite mate.
TAlScore nc = ps - res->alres.score().score();

barrierbank avatar Apr 04 '19 16:04 barrierbank

Thank you for reporting this issue. We are looking into it.

ch4rr0 avatar Apr 12 '19 14:04 ch4rr0

I tried duplicating this issue, but my results differ. Please let me know if I am doing something wrong here. Here's my setup:

Bowtie2 v2.3.5

Index: H. sapiens, hg19 UCSC

Read files:

>r1
TGTGTAGTACCCTTTGTTTTGATAGAGCAGTGTTAAAACACAGTTTCTGTGAAATCTGCAAGTGTTCATTTGCAGC
>r2
TGTGTAGTACTGCCATTTGATTTCTTTCTTTTCCCCCACTTGTGTAATTGTTTTTTAAGACCTGCAAGTTTTATGT
>r1
CGGGGAATTGAACACAAACATCACAAAGGAGTTTCTGAGAATGCTTCTGTCTAGTGTTTCTATGAAGATAATTCTT
>r2
CCATTAAACCAGCCCTACAAAAAATTCTCAAAGGAGTTCTAAACATGGAAACAAAAGGTTGGTACTAGCCATCAT

Results:

@PG	ID:bowtie2	PN:bowtie2	VN:2.3.5	CL:"/home/bowtie2/bowtie2-align-s --very-sensitive-local -x hg19 -f -1 example/reads/read_1.fa -2 example/reads/read_2.fa"
r1	77	*	0	0	*	*	0	0	TGTGTAGTACCCTTTGTTTTGATAGAGCAGTGTTAAAACACAGTTTCTGTGAAATCTGCAAGTGTTCATTTGCAGC	IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII	YT:Z:UP
r1	141	*	0	0	*	*	0	0	CGGGGAATTGAACACAAACATCACAAAGGAGTTTCTGAGAATGCTTCTGTCTAGTGTTTCTATGAAGATAATTCTT	IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII	YT:Z:UP

r2	83	chr3	146378152	44	66M10S	=	146378063	-165	ACATAAAACTTGCAGGTCTTAAAAAACAATTACACAAGTGGGGGAAAAGAAAGAAATCAAATGGCAGTACTACACA	IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII	AS:i:124	XN:i:0	XM:i:1	XO:i:0	XG:i:0	NM:i:1	MD:Z:48A17	YS:i:150	YT:Z:CP
r2	163	chr3	146378063	44	75M	=	146378152	165	CCATTAAACCAGCCCTACAAAAAATTCTCAAAGGAGTTCTAAACATGGAAACAAAAGGTTGGTACTAGCCATCAT	IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII	AS:i:150	XS:i:82	XN:i:0	XM:i:0	XO:i:0	XG:i:0	NM:i:0	MD:Z:75	YS:i:124	YT:Z:CP

If I go ahead and lower the minimum score --score-min G,1,8 only then does the r2 pair gets assigned XS:i:46 and XS:i:48

@PG	ID:bowtie2	PN:bowtie2	VN:2.3.5	CL:"/home/rcharle2/Development/git/bowtie2/bowtie2-align-s --very-sensitive-local -x hg19 -f -1 example/reads/read_1.fa -2 example/reads/read_2.fa --score-min G,1,8"
r1	77	*	0	0	*	*	0	0	TGTGTAGTACCCTTTGTTTTGATAGAGCAGTGTTAAAACACAGTTTCTGTGAAATCTGCAAGTGTTCATTTGCAGC	IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII	YT:Z:UP
r1	141	*	0	0	*	*	0	0	CGGGGAATTGAACACAAACATCACAAAGGAGTTTCTGAGAATGCTTCTGTCTAGTGTTTCTATGAAGATAATTCTT	IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII	YT:Z:UP
r2	83	chr3	146378152	38	66M10S	=	146378063	-165	ACATAAAACTTGCAGGTCTTAAAAAACAATTACACAAGTGGGGGAAAAGAAAGAAATCAAATGGCAGTACTACACA	IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII	AS:i:124	XS:i:46	XN:i:0	XM:i:1	XO:i:0	XG:i:0	NM:i:1	MD:Z:48A17	YS:i:150	YT:Z:CP
r2	163	chr3	146378063	38	75M	=	146378152	165	CCATTAAACCAGCCCTACAAAAAATTCTCAAAGGAGTTCTAAACATGGAAACAAAAGGTTGGTACTAGCCATCAT	IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII	AS:i:150	XS:i:48	XN:i:0	XM:i:0	XO:i:0	XG:i:0	NM:i:0	MD:Z:75	YS:i:124	YT:Z:CP

ch4rr0 avatar Sep 14 '20 15:09 ch4rr0