bowtie2 icon indicating copy to clipboard operation
bowtie2 copied to clipboard

Alignment of pairs missing individuals

Open jakewendt opened this issue 3 years ago • 18 comments

I am using bowtie2 to find reads or read pairs that are chimeric, aligning partly to a virus and partly to human. I initially align the data to the virus, trim the virus from the read or pair and then align the remnant to human. I do this both as paired and unpaired data.

This can be slow, so when doing this for multiple viruses, I create a database of all the viruses and align paired to it and select all reads that either align or whose mate aligned creating a paired subset of the initial data.

The problem is that the unpaired alignment of the complete data contains alignments not in the selected subset. I have lowered the minimum score threshold used in the selection alignment from the default to "G,1,3" and I get many more. But even if I change the minimum score to "C,1,0" they still don't align in paired mode.

In my selection process, I want to align paired so that the dataset remains usable as paired data, but I want the alignment to align even if its just a little bit of one of the reads.

Any suggestions? Am I missing an option?

bowtie2 -x SVAs_and_HERVs_KWHE --very-sensitive-local -1 fastq/reads_R1.fastq -2 fastq/reads_R2.fastq > paired.sam 
590 reads; of these:
  590 (100.00%) were paired; of these:
    590 (100.00%) aligned concordantly 0 times
    0 (0.00%) aligned concordantly exactly 1 time
    0 (0.00%) aligned concordantly >1 times
    ----
    590 pairs aligned concordantly 0 times; of these:
      0 (0.00%) aligned discordantly 1 time
    ----
    590 pairs aligned 0 times concordantly or discordantly; of these:
      1180 mates make up the pairs; of these:
        1180 (100.00%) aligned 0 times
        0 (0.00%) aligned exactly 1 time
        0 (0.00%) aligned >1 times
0.00% overall alignment rate

bowtie2 -x SVAs_and_HERVs_KWHE --very-sensitive-local -U fastq/reads_R1.fastq -U fastq/reads_R2.fastq > unpaired.sam 
1180 reads; of these:
  1180 (100.00%) were unpaired; of these:
    579 (49.07%) aligned 0 times
    4 (0.34%) aligned exactly 1 time
    597 (50.59%) aligned >1 times
50.93% overall alignment rate

jakewendt avatar Nov 17 '20 23:11 jakewendt

It is seeming that -L and -i are key.

Replacing --very-sensitive-local with --local -D 20 -R 3 -N 0 -L 10 -i S,1,0.25 is making a huge difference.

jakewendt avatar Nov 18 '20 15:11 jakewendt

Using --local -D 30 -R 5 -N 0 -L 10 -i S,1,0.15 --score-min G,1,1 instead of just --very-sensitive-local includes all reads so I guess that I just need to scale it back otherwise my selection will include all of the reads.

jakewendt avatar Nov 18 '20 16:11 jakewendt

Ultimately going with --local -D 85 -R 5 -N 0 -L 10 -i S,1,0 for the moment.

Does anyone understand how the aligning algorithm differs when run paired vs unpaired?

jakewendt avatar Nov 18 '20 17:11 jakewendt

Hi @jakewendt ,

I have a similar issue. It appears that bowtie2 mixed-mode is disabled. According to the documentation "If Bowtie 2 cannot find a paired-end alignment for a pair, by default it will go on to look for unpaired alignments for the constituent mates. This is called "mixed mode." You can reproduce this bug with these commands:

# Download the human genome 
$ wget https://genome-idx.s3.amazonaws.com/bt/GRCh38_noalt_as.zip
# Decompress it 
$ unzip GRCh38_noalt_as.zip
# Reads pairs for which only one mate aligns on the human genome
$ cat 1.fastq
@SRR2221473.42/1
GGCAACAAGAGTGAAACTCCATCTCAAAAAAAAAAAATATATATATATATGTGTGTATATATATATGTATATATATGTGTGTATATATATATGTATATATA
+
CCCFFFFFHHHCFIJIJJJJJJJJJJJJJIJJJJJJJCEEECEHFFFEEFFCDCCBCEDCDCDDDFD@CDCCDDDCDCCCBBDDDDDCDDCDECCDEEEE3
@SRR2221473.595/1
TCCATTGCATTCCATTCCATTCCATTCCATTCCAATCCGTTGCATTCCATTCCATTACATTCGGATTGATTCTATTCAACTCCCTTACTCTCCATTACATT
+
CCCFFFFFHHHHHJJJJJJJIJJJJJJJIJJJJIJJJIJJJGIIIIJJJJJJIIJJJJJJJJGJJJIJJJJIIJJIJJJJIIJIJJHHFHGHDEFFFDFE;
@SRR2221473.766/1
AGTGAAATGGAATGGAAGGGAATGGAATGAAATTGAATGGAATGGAATGGAATCAACCCGAGTGCAATGTAATGGAATTGAATGGAATGGAATGGAATGGA
+
@@@FFFFFHHHFHJJHGIIJJIIJJJJGJJJJJFIJJJIJGIIIIIJJJIJJIJIJJIIJGIIIJGGEHCHFFHF@DFFEEEEEDECDCCCCCCDDCACDC
$ cat 2.fastq
@SRR2221473.42/2
ACATATATATATACACACATATATATACATATATATATACACACATATATATACATATATATATACACACATATATATATATATTTTTTTTTTTTGAGATG
+
CCCFFFFFHHHHHJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJIIJGIIIHJJJIJIIIIJIIJJIJJGIJIIJIJIIIJIJJIHDDDDDD5<?BDC
@SRR2221473.595/2
GGAACAACCTGAATGGAATGGAATGTAATGGAGAGTAAGGGAGTTGAATAGAATCAATCCGAATGTAATGGAATGGAATGCAACGGATTGGAATGGAATGG
+
B@@FFFFFHHHHHGJJGIGIJJJJJHIJJJJIIGI?FHIJJHIFHIJJIJHIJJJJJJJJJJIJJJJJJGHHHHHGFEFFFEEEDDDDDDDDDDCDDDDD>
@SRR2221473.766/2
CATTCCATTCCTTTCCATTCTATTAGGGTTAATTCCATTCCATTCCATTCCATTCCATTCAATTCCATTCCATTCTATTCCATTGCAATCGAGTTGATTCC
+
CCCFFFFFHHHHHJJJIJJIJIJJJJJJEFHIJJGGIJJJJJJIIJJJJJHIIJJJJJJIJJJJHIIIGIIIIIJDIIJJIIIJGEIHHIHGIJIIJHHE3
# Unpaired alignment. All forward reads are aligned
$ bowtie2 -U 1.fastq,2.fastq -x GRCh38_noalt_as/GRCh38_noalt_as -p 32 --no-unal --no-sq --no-hd
6 reads; of these:
  6 (100.00%) were unpaired; of these:
    3 (50.00%) aligned 0 times
    3 (50.00%) aligned exactly 1 time
    0 (0.00%) aligned >1 times
50.00% overall alignment rate
SRR2221473.42/1 16      chr5    124767433       24      65M1D1M1D35M    *       0       0       TATATATACATATATATATACACACATATATATACATATATATATACACACATATATATATATATTTTTTTTTTTTGAGATGGAGTTTCACTCTTGTTGCC   3EEEEDCCEDCDDCDDDDDBBCCCDCDDDCCDC@DFDDDCDCDECBCCDCFFEEFFFHECEEECJJJJJJJIJJJJJJJJJJJJJIJIFCHHHFFFFFCCC        AS:i:-22        XN:i:0  XM:i:1  XO:i:2  XG:i:2  NM:i:3  MD:Z:65^A1^A15A19       YT:Z:UU
SRR2221473.595/1        0       chr5    49659837        0       101M    *       0       0       TCCATTGCATTCCATTCCATTCCATTCCATTCCAATCCGTTGCATTCCATTCCATTACATTCGGATTGATTCTATTCAACTCCCTTACTCTCCATTACATT   CCCFFFFFHHHHHJJJJJJJIJJJJJJJIJJJJIJJJIJJJGIIIIJJJJJJIIJJJJJJJJGJJJIJJJJIIJJIJJJJIIJIJJHHFHGHDEFFFDFE;        AS:i:-45        XN:i:0  XM:i:8  XO:i:0  XG:i:0  NM:i:8  MD:Z:34T6C14C7G14T3T4A7C4       YT:Z:UU
SRR2221473.766/1        0       chr17_KI270729v1_random 25552   0       101M    *       0       0       AGTGAAATGGAATGGAAGGGAATGGAATGAAATTGAATGGAATGGAATGGAATCAACCCGAGTGCAATGTAATGGAATTGAATGGAATGGAATGGAATGGA   @@@FFFFFHHHFHJJHGIIJJIIJJJJGJJJJJFIJJJIJGIIIIIJJJIJJIJIJJIIJGIIIJGGEHCHFFHF@DFFEEEEEDECDCCCCCCDDCACDC        AS:i:-54        XN:i:0  XM:i:10 XO:i:0  XG:i:0  NM:i:10 MD:Z:1A2G12T29A11T4G4G8G7C6T7   YT:Z:UU

# Paired alignment. Bowtie2 does not switch to mixed mode
$  bowtie2 -1 1.fastq -2 2.fastq -x GRCh38_noalt_as/GRCh38_noalt_as -p 32 --no-unal --no-sq --no-hd
3 reads; of these:
  3 (100.00%) were paired; of these:
    3 (100.00%) aligned concordantly 0 times
    0 (0.00%) aligned concordantly exactly 1 time
    0 (0.00%) aligned concordantly >1 times
    ----
    3 pairs aligned concordantly 0 times; of these:
      0 (0.00%) aligned discordantly 1 time
    ----
    3 pairs aligned 0 times concordantly or discordantly; of these:
      6 mates make up the pairs; of these:
        6 (100.00%) aligned 0 times
        0 (0.00%) aligned exactly 1 time
        0 (0.00%) aligned >1 times
0.00% overall alignment rate



fplazaonate avatar Jan 07 '21 09:01 fplazaonate

Something happened between versions 2.1.0 and 2.3.4.1

jakewendt avatar Jan 07 '21 15:01 jakewendt

version 2.1.0 has the same behaviour on my side

$ bowtie2-2.1.0/bowtie2 -1 1.fastq -2 2.fastq -x GRCh38_noalt_as/GRCh38_noalt_as -p 32 -S /dev/null
3 reads; of these:
  3 (100.00%) were paired; of these:
    3 (100.00%) aligned concordantly 0 times
    0 (0.00%) aligned concordantly exactly 1 time
    0 (0.00%) aligned concordantly >1 times
    ----
    3 pairs aligned concordantly 0 times; of these:
      0 (0.00%) aligned discordantly 1 time
    ----
    3 pairs aligned 0 times concordantly or discordantly; of these:
      6 mates make up the pairs; of these:
        6 (100.00%) aligned 0 times
        0 (0.00%) aligned exactly 1 time
        0 (0.00%) aligned >1 times
0.00% overall alignment rate
bowtie2-2.1.0/bowtie2 -U 1.fastq,2.fastq -x GRCh38_noalt_as/GRCh38_noalt_as -p 32 -S /dev/null
6 reads; of these:
  6 (100.00%) were unpaired; of these:
    3 (50.00%) aligned 0 times
    3 (50.00%) aligned exactly 1 time
    0 (0.00%) aligned >1 times
50.00% overall alignment rate

fplazaonate avatar Jan 07 '21 15:01 fplazaonate

I have been looking into this issue, but have not found the cause. I have confirmed that mixed mode is enabled, and some of the alignments can be found by including the -a option in the command line. I will continue looking and update this thread when I have more information.

$ ./bowtie2-align-s -2 read1.fq -1 read2.fq -x GRCh38_noalt_as/GRCh38_noalt_as  --no-unal --no-sq --no-hd -a
3 reads; of these:
  3 (100.00%) were paired; of these:
    1 (33.33%) aligned concordantly 0 times
    1 (33.33%) aligned concordantly exactly 1 time
    1 (33.33%) aligned concordantly >1 times
    ----
    1 pairs aligned concordantly 0 times; of these:
      0 (0.00%) aligned discordantly 1 time
    ----
    1 pairs aligned 0 times concordantly or discordantly; of these:
      2 mates make up the pairs; of these:
        1 (50.00%) aligned 0 times
        1 (50.00%) aligned exactly 1 time
        0 (0.00%) aligned >1 times
83.33% overall alignment rate

ch4rr0 avatar Jan 19 '21 16:01 ch4rr0

Thank you very much. This is will be very helpful.

fplazaonate avatar Jan 19 '21 17:01 fplazaonate

Here's the output from a variety of versions ...

module load bowtie2/2.3.4.1

bowtie2 --version
/home/shared/cbc/software_cbc/bowtie2-2.3.4.1/bowtie2-align-s version 2.3.4.1
64-bit
Built on 14231912a8bd
Sat Feb  3 13:04:04 UTC 2018
Compiler: gcc version 4.8.2 20140120 (Red Hat 4.8.2-15) (GCC) 
Options: -O3 -m64 -msse2 -funroll-loops -g3 -g -O2 -fvisibility=hidden -I/hbb_exe/include  -std=c++98 -DPOPCNT_CAPABILITY -DWITH_TBB -DNO_SPINLOCK -DWITH_QUEUELOCK=1
Sizeof {int, long, long long, void*, size_t, off_t}: {4, 8, 8, 8, 8, 8}

bowtie2 -x SVAs_and_HERVs_KWHE --very-sensitive-local -1 fastq/reads_R1.fastq -2 fastq/reads_R2.fastq > /dev/null
590 reads; of these:
  590 (100.00%) were paired; of these:
    590 (100.00%) aligned concordantly 0 times
    0 (0.00%) aligned concordantly exactly 1 time
    0 (0.00%) aligned concordantly >1 times
    ----
    590 pairs aligned concordantly 0 times; of these:
      0 (0.00%) aligned discordantly 1 time
    ----
    590 pairs aligned 0 times concordantly or discordantly; of these:
      1180 mates make up the pairs; of these:
        1180 (100.00%) aligned 0 times
        0 (0.00%) aligned exactly 1 time
        0 (0.00%) aligned >1 times
0.00% overall alignment rate

module load bowtie2/2.3.3.1

The following have been reloaded with a version change:
  1) bowtie2/2.3.4.1 => bowtie2/2.3.3.1

bowtie2 --version
/home/shared/cbc/software_cbc/bowtie2-2.3.3.1/bowtie2-align-s version 2.3.3.1
64-bit
Built on c1045ed0e5f3
Thu Oct  5 16:59:35 UTC 2017
Compiler: gcc version 4.8.2 20140120 (Red Hat 4.8.2-15) (GCC) 
Options: -O3 -m64 -msse2 -funroll-loops -g3 -DPOPCNT_CAPABILITY -DWITH_TBB -DNO_SPINLOCK -DWITH_QUEUELOCK=1
Sizeof {int, long, long long, void*, size_t, off_t}: {4, 8, 8, 8, 8, 8}

bowtie2 -x SVAs_and_HERVs_KWHE --very-sensitive-local -1 fastq/reads_R1.fastq -2 fastq/reads_R2.fastq > /dev/null
590 reads; of these:
  590 (100.00%) were paired; of these:
    563 (95.42%) aligned concordantly 0 times
    7 (1.19%) aligned concordantly exactly 1 time
    20 (3.39%) aligned concordantly >1 times
    ----
    563 pairs aligned concordantly 0 times; of these:
      0 (0.00%) aligned discordantly 1 time
    ----
    563 pairs aligned 0 times concordantly or discordantly; of these:
      1126 mates make up the pairs; of these:
        604 (53.64%) aligned 0 times
        16 (1.42%) aligned exactly 1 time
        506 (44.94%) aligned >1 times
48.81% overall alignment rate

module load bowtie2/2.2.9

The following have been reloaded with a version change:
  1) bowtie2/2.3.3.1 => bowtie2/2.2.9

bowtie2 --version
/home/shared/cbc/software_cbc/bowtie2-2.2.9/bowtie2-align-s version 2.2.9
64-bit
Built on localhost.localdomain
Thu Apr 21 18:36:37 EDT 2016
Compiler: gcc version 4.1.2 20080704 (Red Hat 4.1.2-54)
Options: -O3 -m64 -msse2  -funroll-loops -g3 -DPOPCNT_CAPABILITY
Sizeof {int, long, long long, void*, size_t, off_t}: {4, 8, 8, 8, 8, 8}

bowtie2 -x SVAs_and_HERVs_KWHE --very-sensitive-local -1 fastq/reads_R1.fastq -2 fastq/reads_R2.fastq > /dev/null
590 reads; of these:
  590 (100.00%) were paired; of these:
    553 (93.73%) aligned concordantly 0 times
    11 (1.86%) aligned concordantly exactly 1 time
    26 (4.41%) aligned concordantly >1 times
    ----
    553 pairs aligned concordantly 0 times; of these:
      0 (0.00%) aligned discordantly 1 time
    ----
    553 pairs aligned 0 times concordantly or discordantly; of these:
      1106 mates make up the pairs; of these:
        594 (53.71%) aligned 0 times
        15 (1.36%) aligned exactly 1 time
        497 (44.94%) aligned >1 times
49.66% overall alignment rate

module load bowtie2/2.2.6

The following have been reloaded with a version change:
  1) bowtie2/2.2.9 => bowtie2/2.2.6

bowtie2 --version
/home/shared/cbc/software_cbc/bowtie2-2.2.6/bowtie2-align-s version 2.2.6
64-bit
Built on localhost.localdomain
Wed Jul 22 16:18:32 EDT 2015
Compiler: gcc version 4.1.2 20080704 (Red Hat 4.1.2-54)
Options: -O3 -m64 -msse2  -funroll-loops -g3 -DPOPCNT_CAPABILITY
Sizeof {int, long, long long, void*, size_t, off_t}: {4, 8, 8, 8, 8, 8}

bowtie2 -x SVAs_and_HERVs_KWHE --very-sensitive-local -1 fastq/reads_R1.fastq -2 fastq/reads_R2.fastq > /dev/null
590 reads; of these:
  590 (100.00%) were paired; of these:
    553 (93.73%) aligned concordantly 0 times
    11 (1.86%) aligned concordantly exactly 1 time
    26 (4.41%) aligned concordantly >1 times
    ----
    553 pairs aligned concordantly 0 times; of these:
      0 (0.00%) aligned discordantly 1 time
    ----
    553 pairs aligned 0 times concordantly or discordantly; of these:
      1106 mates make up the pairs; of these:
        594 (53.71%) aligned 0 times
        15 (1.36%) aligned exactly 1 time
        497 (44.94%) aligned >1 times
49.66% overall alignment rate

module load bowtie2/2.1.0

The following have been reloaded with a version change:
  1) bowtie2/2.2.6 => bowtie2/2.1.0

bowtie2 --version
/home/shared/cbc/software_cbc/bowtie2-2.1.0/bowtie2-align version 2.1.0
64-bit
Built on do-dmxp-mac.win.ad.jhu.edu
Tue Feb 26 13:34:02 EST 2013
Compiler: gcc version 4.1.2 20080704 (Red Hat 4.1.2-54)
Options: -O3 -m64 -msse2 -funroll-loops -g3 
Sizeof {int, long, long long, void*, size_t, off_t}: {4, 8, 8, 8, 8, 8}

bowtie2 -x SVAs_and_HERVs_KWHE --very-sensitive-local -1 fastq/reads_R1.fastq -2 fastq/reads_R2.fastq > /dev/null
590 reads; of these:
  590 (100.00%) were paired; of these:
    553 (93.73%) aligned concordantly 0 times
    11 (1.86%) aligned concordantly exactly 1 time
    26 (4.41%) aligned concordantly >1 times
    ----
    553 pairs aligned concordantly 0 times; of these:
      0 (0.00%) aligned discordantly 1 time
    ----
    553 pairs aligned 0 times concordantly or discordantly; of these:
      1106 mates make up the pairs; of these:
        594 (53.71%) aligned 0 times
        15 (1.36%) aligned exactly 1 time
        497 (44.94%) aligned >1 times
49.66% overall alignment rate

jakewendt avatar Jan 19 '21 17:01 jakewendt

Hello @jakewendt,

Your inputs show differences in behavior between versions. Would it be possible to share these files so that I can recreate the issue?

ch4rr0 avatar Jan 21 '21 15:01 ch4rr0

Here are the reads and index.

bowtie2-testing.tar.gz

jakewendt avatar Jan 21 '21 17:01 jakewendt

Here are the reads and index.

bowtie2-testing.tar.gz

I pushed a potential fix for the issue to the bug_fixes branch. Here's my output:

$ ./bowtie2-align-s-debug --version
Warning: Running in debug mode.  Please use debug mode only for diagnosing errors, and not for typical use of Bowtie 2.
./bowtie2-align-s-debug version 2.4.1
64-bit
Built on
Fri Jan 22 16:25:31 UTC 2021
Compiler: InstalledDir: /usr/bin
Options: -O0 -g3 -msse2 -std=c++11 -DPOPCNT_CAPABILITY
Sizeof {int, long, long long, void*, size_t, off_t}: {4, 8, 8, 8, 8, 8}

$ ./bowtie2-align-s-debug --very-sensitive-local -x bowtie2-testing/SVAs_and_HERVs_KWHE -1 bowtie2-testing/reads_R1.fastq.gz -2 bowtie2-testing/reads_R2.fastq.gz  > /dev/null
Warning: Running in debug mode.  Please use debug mode only for diagnosing errors, and not for typical use of Bowtie 2.
590 reads; of these:
  590 (100.00%) were paired; of these:
    563 (95.42%) aligned concordantly 0 times
    7 (1.19%) aligned concordantly exactly 1 time
    20 (3.39%) aligned concordantly >1 times
    ----
    563 pairs aligned concordantly 0 times; of these:
      0 (0.00%) aligned discordantly 1 time
    ----
    563 pairs aligned 0 times concordantly or discordantly; of these:
      1126 mates make up the pairs; of these:
        604 (53.64%) aligned 0 times
        16 (1.42%) aligned exactly 1 time
        506 (44.94%) aligned >1 times
48.81% overall alignment rate

ch4rr0 avatar Jan 22 '21 16:01 ch4rr0

Thanks. No improvements on my side.

$ bowtie2-bug_fixes/bowtie2 -U  1.fastq,2.fastq -x GRCh38_noalt_as/GRCh38_noalt_as -p 32 --no-unal --no-sq --no-hd -S /dev/null
6 reads; of these:
  6 (100.00%) were unpaired; of these:
    3 (50.00%) aligned 0 times
    3 (50.00%) aligned exactly 1 time
    0 (0.00%) aligned >1 times
50.00% overall alignment rate

$bowtie2-bug_fixes/bowtie2 -1 1.fastq -2 2.fastq -x GRCh38_noalt_as/GRCh38_noalt_as -p 32 --no-unal --no-sq --no-hd -S /dev/null
3 reads; of these:
  3 (100.00%) were paired; of these:
    3 (100.00%) aligned concordantly 0 times
    0 (0.00%) aligned concordantly exactly 1 time
    0 (0.00%) aligned concordantly >1 times
    ----
    3 pairs aligned concordantly 0 times; of these:
      0 (0.00%) aligned discordantly 1 time
    ----
    3 pairs aligned 0 times concordantly or discordantly; of these:
      6 mates make up the pairs; of these:
        6 (100.00%) aligned 0 times
        0 (0.00%) aligned exactly 1 time
        0 (0.00%) aligned >1 times
0.00% overall alignment rate

fplazaonate avatar Jan 22 '21 17:01 fplazaonate

@fplaza — this commit won’t fix the issue completely. It was mostly directed to @jakewendt‘s issue. I should have made that clear in my last message.

ch4rr0 avatar Jan 22 '21 17:01 ch4rr0

Thank you.

jakewendt avatar Jan 22 '21 17:01 jakewendt

Hello @ch4rr0 ,

Any news about the investigation of this issue?

Thanks

fplazaonate avatar Jul 04 '22 13:07 fplazaonate

Ultimately going with --local -D 85 -R 5 -N 0 -L 10 -i S,1,0 for the moment.

Does anyone understand how the aligning algorithm differs when run paired vs unpaired?

Hello, did you make any progress on this? I am exploring and searching the differences, too.

c4-driod avatar Jun 02 '23 07:06 c4-driod

Upgrading to the latest version mostly resolved my issues

jakewendt avatar Jun 02 '23 14:06 jakewendt