bowtie2
bowtie2 copied to clipboard
In paired end mode, bowtie2 sometimes reports a mate alignment that does not meet the validity scoring threshold
bowtie2 is reporting some alignments that do not meet the scoring threshold in paired-end mode. We see this in two manifestations:
- bowtie2 reports a concordant pair alignment where the AS:i value for one of the mates does not meet the scoring threshold (and so this is actually an invalid alignment).
- bowtie2 reports an alternative alignment score (XS:i) for a mate in a valid concordant pair alignment, but the XS:i value is below the scoring threshold for the mate.
Example of number 1: AS:i:44 in the first line; the read length is 76 and the scoring threshold should be 54
NDX550191_RUO:5:HFN53AFXY:1:11108:1346:3854 99 chr9 66806000 22 10S50M16S = 66806109 181 TGTGTAGTACCCTTTGTTTTGATAGAGCAGTGTTAAAACACAGTTTCTGTGAAATCTGCAAGTGTTCATTTGCAGC * AS:i:44 XN:i:0 XM:i:7 XO:i:0 XG:i:0 NM:i:7 MD:Z:5A15T2G6T0C3T4G8 YS:i:66 YT:Z:CP
NDX550191_RUO:5:HFN53AFXY:1:11108:1346:3854 147 chr9 66806109 22 21S41M14S = 66806000 -181 AAGAATTATCTTCATAGAAACACTAGACAGAAGCATTCTCAGAAACTCCTTTGTGATGTTTGTGTTCAATTCCCCG * AS:i:66 XS:i:88 XN:i:0 XM:i:2 XO:i:0 XG:i:0 NM:i:2 MD:Z:4C5G30 YS:i:44 YT:Z:CP
Example of number 2: XS:i:46 in the first line; the read length is 76 and the scoring threshold should be 54
NDX550191_RUO:5:HFN53AFXY:1:11101:1606:11415 83 chr3 146378152 39 66M10S = 146378063 -155 ACATAAAACTTGCAGGTCTTAAAAAACAATTACACAAGTGGGGGAAAAGAAAGAAATCAAATGGCAGTACTACACA * AS:i:124 XS:i:46 XN:i:0 XM:i:1 XO:i:0 XG:i:0 NM:i:1 MD:Z:48A17 YS:i:150 YT:Z:CP
NDX550191_RUO:5:HFN53AFXY:1:11101:1606:11415 163 chr3 146378063 39 75M = 146378152 155 CCATTAAACCAGCCCTACAAAAAATTCTCAAAGGAGTTCTAAACATGGAAACAAAAGGTTGGTACTAGCCATCAT * AS:i:150 XS:i:48 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:75 YS:i:124 YT:Z:CP
Here is our command line:
bowtie2 -f --very-sensitive-local -p 24 -x $bowtie2_hg19_index -1 ${output_file}_1_trimmed.fasta -2 ${output_file}_2_trimmed.fasta
Now here is my theory about what is happening. It looks like bowtie2 first finds a valid single-mate alignment and records its AS:i score as the best so far. It then tries to improve on that score. It finds a valid single-mate alignment and uses that as an anchor to look for a concordant pair alignment, however it only requires the total score (sum of both mates' AS:i) for that alignment to exceed the previous best score it found, which was for a single-mate alignment. So, it can find a "concordant" pair alignment with a total score that improves on the previous single-mate alignment, but the alignment scores of the individual mates might not be high enough to make each mate alignment individually a valid alignment.
I have two pieces of evidence to support this:
- In every bad pair alignment that was reported, at least one of the pairs had an XS:i tag. This could be referring to the valid single-mate alignment that was then "improved on" by the invalid pair alignment.
- It appears to be consistent with the bowtie2 code, SwDriver::extendSeedsPaired (bowtie2-2.3.5 source code, aligner_sw_driver.cpp, line 1967):
// Anchor mate must have score at least 'ps' minus the best possible
// score for the opposite mate.
TAlScore nc = ps - res->alres.score().score();
Thank you for reporting this issue. We are looking into it.
I tried duplicating this issue, but my results differ. Please let me know if I am doing something wrong here. Here's my setup:
Bowtie2 v2.3.5
Index: H. sapiens, hg19 UCSC
Read files:
>r1
TGTGTAGTACCCTTTGTTTTGATAGAGCAGTGTTAAAACACAGTTTCTGTGAAATCTGCAAGTGTTCATTTGCAGC
>r2
TGTGTAGTACTGCCATTTGATTTCTTTCTTTTCCCCCACTTGTGTAATTGTTTTTTAAGACCTGCAAGTTTTATGT
>r1
CGGGGAATTGAACACAAACATCACAAAGGAGTTTCTGAGAATGCTTCTGTCTAGTGTTTCTATGAAGATAATTCTT
>r2
CCATTAAACCAGCCCTACAAAAAATTCTCAAAGGAGTTCTAAACATGGAAACAAAAGGTTGGTACTAGCCATCAT
Results:
@PG ID:bowtie2 PN:bowtie2 VN:2.3.5 CL:"/home/bowtie2/bowtie2-align-s --very-sensitive-local -x hg19 -f -1 example/reads/read_1.fa -2 example/reads/read_2.fa"
r1 77 * 0 0 * * 0 0 TGTGTAGTACCCTTTGTTTTGATAGAGCAGTGTTAAAACACAGTTTCTGTGAAATCTGCAAGTGTTCATTTGCAGC IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII YT:Z:UP
r1 141 * 0 0 * * 0 0 CGGGGAATTGAACACAAACATCACAAAGGAGTTTCTGAGAATGCTTCTGTCTAGTGTTTCTATGAAGATAATTCTT IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII YT:Z:UP
r2 83 chr3 146378152 44 66M10S = 146378063 -165 ACATAAAACTTGCAGGTCTTAAAAAACAATTACACAAGTGGGGGAAAAGAAAGAAATCAAATGGCAGTACTACACA IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII AS:i:124 XN:i:0 XM:i:1 XO:i:0 XG:i:0 NM:i:1 MD:Z:48A17 YS:i:150 YT:Z:CP
r2 163 chr3 146378063 44 75M = 146378152 165 CCATTAAACCAGCCCTACAAAAAATTCTCAAAGGAGTTCTAAACATGGAAACAAAAGGTTGGTACTAGCCATCAT IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII AS:i:150 XS:i:82 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:75 YS:i:124 YT:Z:CP
If I go ahead and lower the minimum score --score-min G,1,8
only then does the r2
pair gets assigned XS:i:46
and XS:i:48
@PG ID:bowtie2 PN:bowtie2 VN:2.3.5 CL:"/home/rcharle2/Development/git/bowtie2/bowtie2-align-s --very-sensitive-local -x hg19 -f -1 example/reads/read_1.fa -2 example/reads/read_2.fa --score-min G,1,8"
r1 77 * 0 0 * * 0 0 TGTGTAGTACCCTTTGTTTTGATAGAGCAGTGTTAAAACACAGTTTCTGTGAAATCTGCAAGTGTTCATTTGCAGC IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII YT:Z:UP
r1 141 * 0 0 * * 0 0 CGGGGAATTGAACACAAACATCACAAAGGAGTTTCTGAGAATGCTTCTGTCTAGTGTTTCTATGAAGATAATTCTT IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII YT:Z:UP
r2 83 chr3 146378152 38 66M10S = 146378063 -165 ACATAAAACTTGCAGGTCTTAAAAAACAATTACACAAGTGGGGGAAAAGAAAGAAATCAAATGGCAGTACTACACA IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII AS:i:124 XS:i:46 XN:i:0 XM:i:1 XO:i:0 XG:i:0 NM:i:1 MD:Z:48A17 YS:i:150 YT:Z:CP
r2 163 chr3 146378063 38 75M = 146378152 165 CCATTAAACCAGCCCTACAAAAAATTCTCAAAGGAGTTCTAAACATGGAAACAAAAGGTTGGTACTAGCCATCAT IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII AS:i:150 XS:i:48 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:75 YS:i:124 YT:Z:CP