bowtie2
bowtie2 copied to clipboard
feature request: functionality like "bwa mem -C" to move fastq comments to sam tags
Hi, it would be great if Bowtie 2 could move fastq read header comments to sam tags, similar to the "-C" function of BWA MEM. Although Bowtie 2 supports alignment of ubams (unlike BWA), which would preserve tags that already exist, unfortunately I must use fastqs as Bowtie input due to necessary upstream processing tools, which means moving tags I need to preserve to fastq header comments using samtools fastq -T.
It looks like others are also interested in this functionality: https://www.biostars.org/p/405778/
Thanks!
This one did not require much effort so I went ahead and put together a quick implementation of what I think you're asking for. Feedback is appreciated.
Input
==> example/reads/reads_1.fq <==
@r1 BC:Z:CGTAC
TGAATGCGAACTCCGGGACGCTCAGTAATGTGACGATAGCTGAAAACTGTACGATAAACNGTACGCTGAGGGCAGAAAAAATCGTCGGGGACATTNTAAAGGCGGCGAGCGCGGCTTTTCCG
+
+"@6<:27(F&5)9)"B:%B+A-%5A?2$HCB0B+0=D<7E/<.03#!.F77@6B==?C"7>;))%;,3-$.A06+<-1/@@?,26">=?*@'0;$:;??G+:#+(A?9+10!8!?()?7C>
==> example/reads/reads_2.fq <==
@r1 BC:Z:CGTAG
NAACGTCAGCGGAAGCACCACTATCTGGCGATCAAAAGGATGGTCATCGGTCACGGTGACAGTACGGGTACCTGACGGCCAGTCCACACTGCTTTCACGCTGGCGCGGNAAAGCCGCGCTCGCCGCCTNTACNATGTCCCCGACGATTTTTTCCGCCCTCAGCGTACCGTTTATCGTACAGTTTTCAGCTATCGTCAC
+
&?$F&9AC8H49@5<*>=E;2/G+;9?&8D.;E!2:0B/)++#F-><E,-/&F:&/#B=*'C?@$0!+#1?.ABC?:-?9GA?,11.2>3$&@-'6;!6@D?&#*E:C#9-4&8?"))H>H9-->((E99;:@::1=1-!+$@$8+#@%(<B,+%6!>@0G*D5A'(>9AC<;@01:7@:@0D&D=@:>7"1)D+9+B
Command line:
./bowtie2-align-s -x example/index/lambda_virus -1 example/reads/reads_1.fq -2 example/reads/reads_2.fq -u1 --sam-append-comment
Output:
1 reads; of these:
1 (100.00%) were paired; of these:
0 (0.00%) aligned concordantly 0 times
1 (100.00%) aligned concordantly exactly 1 time
0 (0.00%) aligned concordantly >1 times
----
0 pairs aligned concordantly 0 times; of these:
0 (0.00%) aligned discordantly 1 time
----
0 pairs aligned 0 times concordantly or discordantly; of these:
0 mates make up the pairs; of these:
0 (0.00%) aligned 0 times
0 (0.00%) aligned exactly 1 time
0 (0.00%) aligned >1 times
100.00% overall alignment rate
@HD VN:1.0 SO:unsorted
@SQ SN:gi|9626243|ref|NC_001416.1| LN:48502
@PG ID:bowtie2 PN:bowtie2 VN:2.4.1 CL:"/home/rcharles/Development/git/bowtie2/bowtie2-align-s -x example/index/lambda_virus -1 example/reads/reads_1.fq -2 example/reads/reads_2.fq -u1 --sam-append-comment"
r1 99 gi|9626243|ref|NC_001416.1| 18401 42 122M = 18430 227 TGAATGCGAACTCCGGGACGCTCAGTAATGTGACGATAGCTGAAAACTGTACGATAAACNGTACGCTGAGGGCAGAAAAAATCGTCGGGGACATTNTAAAGGCGGCGAGCGCGGCTTTTCCG +"@6<:27(F&5)9)"B:%B+A-%5A?2$HCB0B+0=D<7E/<.03#!.F77@6B==?C"7>;))%;,3-$.A06+<-1/@@?,26">=?*@'0;$:;??G+:#+(A?9+10!8!?()?7C> AS:i:-5 XN:i:0 XM:i:3 XO:i:0 XG:i:0 NM:i:3 MD:Z:59G13G21G26 YS:i:-4 YT:Z:CP BC:Z:CGTAC
r1 147 gi|9626243|ref|NC_001416.1| 18430 42 198M = 18401 -227 GTGACGATAGCTGAAAACTGTACGATAAACGGTACGCTGAGGGCGGAAAAAATCGTCGGGGACATNGTANAGGCGGCGAGCGCGGCTTTNCCGCGCCAGCGTGAAAGCAGTGTGGACTGGCCGTCAGGTACCCGTACTGTCACCGTGACCGATGACCATCCTTTTGATCGCCAGATAGTGGTGCTTCCGCTGACGTTN B+9+D)1"7>:@=D&D0@:@7:10@;<CA9>('A5D*G0@>!6%+,B<(%@#+8$@$+!-1=1::@:;99E((>--9H>H))"?8&4-9#C:E*#&?D@6!;6'-@&$3>2.11,?AG9?-:?CBA.?1#+!0$@?C'*=B#/&:F&/-,E<>-F#++)/B0:2!E;.D8&?9;+G/2;E=>*<5@94H8CA9&F$?& AS:i:-4 XN:i:0 XM:i:4 XO:i:0 XG:i:0 NM:i:4 MD:Z:65T3A19T107T0 YS:i:-5 YT:Z:CP BC:Z:CGTAG
This looks like it does exactly what I had in mind, thanks! Can it handle multiple comments/tags?
It should, as it considers everything after the first space in the read name to be a comment. It’s up to the user to make sure that the comment is formatted correctly in order to produce output compliant with the SAM spec.
We have included this feature in our most recent release, v2.4.2. Thank you for putting in the request.
I am trying this new feature, but I am hitting a problem when running --un
with --sam-append-comment
. When you specify to output unmapped reads to a new fastq, the resulting sam file does not have appended comments.
@PG ID:bowtie2 PN:bowtie2 VN:2.4.4 CL:"/usr/local/bin/bowtie2-align-s --wrapper basic-0 --rg-id BMG --rg SM:634929898631_Cut_0 --very-sensitive -L 30 --score-min L,-0.6,-0.2 --end-to-end --reorder --sam-append-comment -p 12 -x Bowtie2Index//genome -U read.fastq.gz"
NB501381:434:H5732BGXK:4:23606:22621:3148 16 chr8 124684412 42 147M * 0 0 AAAAAGCAAAATGAGCCTCCAGTTGGGTTGTGGCTGGCCATGCCCCCCACACGGCACAGCACTTACCTAATGTGAGGCTGCAGGCCGGGCTGCTCTTCATAGCCGTCTATGAGGAAAGGCAGGTTCTTGGCCCAGTGGTCCTTGAAG /EEAEEAEEA</<A<<//AAEAE</AE/EA<E<EA/AA/EEEEAEE<E/AEE<6EAEE<EE/AE<EAEEAEEAEEE<EAEE/AEA<AA<AEAEEEEE/A/A<EAE/EAE<EEEEEEEEEEEEE/EEE/EEEEEEEEAEEE6EEEEEA AS:i:-3 XN:i:0 XM:i:1 XO:i:0 XG:i:0 NM:i:1 MD:Z:0C146 YT:Z:UU RG:Z:BMG BC:Z:AGTTGCCG+TTGGCATC ZA:Z:TGGT ZB:Z:AAGT RX:Z:TGG-AAG QX:Z:AAA AA/
NB501381:434:H5732BGXK:2:11302:5169:4814 0 chr7 22149254 1 147M * 0 0 TACACAAACAGGACCCAACATTCTGCTGCAAACAGGAAACACATCTCAAGGAAAAAGACAGACATTACCTCAGAGTGAAAGGCTGGAAAACAATTTTCCAAGCAAATGGTCTGAAGAAACAAGCTGGAGTAGCCATTCTAATATCAC AEEEEEE//EEEE/<//EEEAAAAE/EEA//E<E/AEAEEE//AE6EEE//EEEEE/AAE<EAEA/E<EE<AEAEE/EEE<//AEAEE/EEEEAAEAEAEE/EAEEE6EEAEE///<E/<AA/AAE/A/E<<<<6AAEE/EAEEAA/ AS:i:-9 XS:i:-9 XN:i:0 XM:i:3 XO:i:0 XG:i:0 NM:i:3 MD:Z:29T0T115G0 YT:Z:UU RG:Z:BMG BC:Z:AGTTGCCG+TTGGCATC ZA:Z:CTGT ZB:Z:GAACT RX:Z:CTG-GAA QX:Z:/AA /AA
NB501381:434:H5732BGXK:3:21504:13617:9739 4 * 0 0 * * 0 0 GATCTCACTTTGTAGCCTTGGCTAGCTTCAAATTCATGATCGATCCAGAAGAGAGAAATGATGGCAAGAACTTTTATCTGGGGAGGTGGGGGTATGAAGAGGGGGTTGTCTACGGTTGTACATACACCTGCACATTAGCCTAAGCCT AEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE/EEE/EEEEEEEEEEEEEEEEAAEEEEEEEE/EEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEEAEEEEEAEEEEEEE/A<<AEEAEAEAEAEAAEEEA<A/ YT:Z:UU RG:Z:BMG BC:Z:AGTTGCCG+TTGGCATC ZA:Z:CTGT ZB:Z:GAGT RX:Z:CTG-GAG QX:Z:AAA AAA
NB501381:434:H5732BGXK:4:13602:12680:1470 4 * 0 0 * * 0 0 TCTCAGAACTGTATCTTGCCAAAGCCTTTGCCTAACGAGGCTTTGAACTCAGGAGGCTCTTTCCTGCTAAAGCAAAATTAAATGAAATCTCTCACTGATCGATCTGTATCTTGTTCTTTGTTCTTCTAGAGCCCTAGTACATTGCAA AEEEEEEEEEEEEEEEEEEAEEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEEEEEEEEEEE<EEEEEEEEEEEEEEEEEEEEEA/EEEEAEEEEEAEEAEEEAEEEEEEEEEEEA6<<AAAE<A/<E6A< YT:Z:UU RG:Z:BMG BC:Z:AGTTGCCG+TTGGCATC ZA:Z:GTCT ZB:Z:CCTGT RX:Z:GTC-CCT QX:Z:AAA AAA
@PG ID:bowtie2 PN:bowtie2 VN:2.4.4 CL:"/usr/local/bin/bowtie2-align-s --wrapper basic-0 --rg-id BMG --rg SM:634929898631_Cut_0 --very-sensitive -L 30 --score-min L,-0.6,-0.2 --end-to-end --reorder --sam-append-comment -p 12 -x Bowtie2Index//genome --passthrough -U read.fastq.gz"
NB501381:434:H5732BGXK:4:23606:22621:3148 16 chr8 124684412 42 147M * 0 0 AAAAAGCAAAATGAGCCTCCAGTTGGGTTGTGGCTGGCCATGCCCCCCACACGGCACAGCACTTACCTAATGTGAGGCTGCAGGCCGGGCTGCTCTTCATAGCCGTCTATGAGGAAAGGCAGGTTCTTGGCCCAGTGGTCCTTGAAG /EEAEEAEEA</<A<<//AAEAE</AE/EA<E<EA/AA/EEEEAEE<E/AEE<6EAEE<EE/AE<EAEEAEEAEEE<EAEE/AEA<AA<AEAEEEEE/A/A<EAE/EAE<EEEEEEEEEEEEE/EEE/EEEEEEEEAEEE6EEEEEA AS:i:-3 XN:i:0 XM:i:1 XO:i:0 XG:i:0 NM:i:1 MD:Z:0C146 YT:Z:UU RG:Z:BMG
NB501381:434:H5732BGXK:2:11302:5169:4814 0 chr7 22149254 1 147M * 0 0 TACACAAACAGGACCCAACATTCTGCTGCAAACAGGAAACACATCTCAAGGAAAAAGACAGACATTACCTCAGAGTGAAAGGCTGGAAAACAATTTTCCAAGCAAATGGTCTGAAGAAACAAGCTGGAGTAGCCATTCTAATATCAC AEEEEEE//EEEE/<//EEEAAAAE/EEA//E<E/AEAEEE//AE6EEE//EEEEE/AAE<EAEA/E<EE<AEAEE/EEE<//AEAEE/EEEEAAEAEAEE/EAEEE6EEAEE///<E/<AA/AAE/A/E<<<<6AAEE/EAEEAA/ AS:i:-9 XS:i:-9 XN:i:0 XM:i:3 XO:i:0 XG:i:0 NM:i:3 MD:Z:29T0T115G0 YT:Z:UU RG:Z:BMG
NB501381:434:H5732BGXK:3:21504:13617:9739 4 * 0 0 * * 0 0 GATCTCACTTTGTAGCCTTGGCTAGCTTCAAATTCATGATCGATCCAGAAGAGAGAAATGATGGCAAGAACTTTTATCTGGGGAGGTGGGGGTATGAAGAGGGGGTTGTCTACGGTTGTACATACACCTGCACATTAGCCTAAGCCT AEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE/EEE/EEEEEEEEEEEEEEEEAAEEEEEEEE/EEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEEAEEEEEAEEEEEEE/A<<AEEAEAEAEAEAAEEEA<A/ YT:Z:UU RG:Z:BMG
NB501381:434:H5732BGXK:4:13602:12680:1470 4 * 0 0 * * 0 0 TCTCAGAACTGTATCTTGCCAAAGCCTTTGCCTAACGAGGCTTTGAACTCAGGAGGCTCTTTCCTGCTAAAGCAAAATTAAATGAAATCTCTCACTGATCGATCTGTATCTTGTTCTTTGTTCTTCTAGAGCCCTAGTACATTGCAA AEEEEEEEEEEEEEEEEEEAEEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEEEEEEEEEEE<EEEEEEEEEEEEEEEEEEEEEA/EEEEAEEEEEAEEAEEEAEEEEEEEEEEEA6<<AAAE<A/<E6A< YT:Z:UU RG:Z:BMG
cat temp_trial.fastq
@NB501381:434:H5732BGXK:3:21504:13617:9739 BC:Z:AGTTGCCG+TTGGCATC ZA:Z:CTGT ZB:Z:GAGT RX:Z:CTG-GAG QX:Z:AAA AAA
GATCTCACTTTGTAGCCTTGGCTAGCTTCAAATTCATGATCGATCCAGAAGAGAGAAATGATGGCAAGAACTTTTATCTGGGGAGGTGGGGGTATGAAGAGGGGGTTGTCTACGGTTGTACATACACCTGCACATTAGCCTAAGCCT
+
AEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE/EEE/EEEEEEEEEEEEEEEEAAEEEEEEEE/EEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEEAEEEEEAEEEEEEE/A<<AEEAEAEAEAEAAEEEA<A/
BC:Z:AGTTGCCG+TTGGCATC ZA:Z:CTGT ZB:Z:GAGT RX:Z:CTG-GAG QX:Z:AAA AAA@NB501381:434:H5732BGXK:4:13602:12680:1470 BC:Z:AGTTGCCG+TTGGCATC ZA:Z:GTCT ZB:Z:CCTGT RX:Z:GTC-CCT QX:Z:AAA AAA
TCTCAGAACTGTATCTTGCCAAAGCCTTTGCCTAACGAGGCTTTGAACTCAGGAGGCTCTTTCCTGCTAAAGCAAAATTAAATGAAATCTCTCACTGATCGATCTGTATCTTGTTCTTTGTTCTTCTAGAGCCCTAGTACATTGCAA
+
AEEEEEEEEEEEEEEEEEEAEEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEEEEEEEEEEE<EEEEEEEEEEEEEEEEEEEEEA/EEEEAEEEEEAEEAEEEAEEEEEEEEEEEA6<<AAAE<A/<E6A<
BC:Z:AGTTGCCG+TTGGCATC ZA:Z:GTCT ZB:Z:CCTGT RX:Z:GTC-CCT QX:Z:AAA AAA
Hello, I pushed a change to bug_fixes that should resolve this issue. Here's my sample run:
$ ./bowtie2 --debug -x example/index/lambda_virus test.fq --sam-append-comment --un foo.fq --sam-nohead
Warning: Running in debug mode. Please use debug mode only for diagnosing errors, and not for typical use of Bowtie 2.
2 reads; of these:
2 (100.00%) were unpaired; of these:
2 (100.00%) aligned 0 times
0 (0.00%) aligned exactly 1 time
0 (0.00%) aligned >1 times
0.00% overall alignment rate
NB501381:434:H5732BGXK:3:21504:13617:9739 4 * 0 0 * * 0 0 GATCTCACTTTGTAGCCTTGGCTAGCTTCAAATTCATGATCGATCCAGAAGAGAGAAATGATGGCAAGAACTTTTATCTGGGGAGGTGGGGGTATGAAGAGGGGGTTGTCTACGGTTGTACATACACCTGCACATTAGCCTAAGCCT AEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE/EEE/EEEEEEEEEEEEEEEEAAEEEEEEEE/EEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEEAEEEEEAEEEEEEE/A<<AEEAEAEAEAEAAEEEA<A/ YT:Z:UU BC:Z:AGTTGCCG+TTGGCATC ZA:Z:CTGT ZB:Z:GAGT RX:Z:CTG-GAG QX:Z:AAA AAA
NB501381:434:H5732BGXK:4:13602:12680:1470 4 * 0 0 * * 0 0 TCTCAGAACTGTATCTTGCCAAAGCCTTTGCCTAACGAGGCTTTGAACTCAGGAGGCTCTTTCCTGCTAAAGCAAAATTAAATGAAATCTCTCACTGATCGATCTGTATCTTGTTCTTTGTTCTTCTAGAGCCCTAGTACATTGCAA AEEEEEEEEEEEEEEEEEEAEEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEEEEEEEEEEE<EEEEEEEEEEEEEEEEEEEEEA/EEEEAEEEEEAEEAEEEAEEEEEEEEEEEA6<<AAAE<A/<E6A< YT:Z:UU BC:Z:AGTTGCCG+TTGGCATC ZA:Z:GTCT ZB:Z:CCTGT RX:Z:GTC-CCT QX:Z:AAA AAA
$ cat foo.fq
@NB501381:434:H5732BGXK:3:21504:13617:9739 BC:Z:AGTTGCCG+TTGGCATC ZA:Z:CTGT ZB:Z:GAGT RX:Z:CTG-GAG QX:Z:AAA AAA
GATCTCACTTTGTAGCCTTGGCTAGCTTCAAATTCATGATCGATCCAGAAGAGAGAAATGATGGCAAGAACTTTTATCTGGGGAGGTGGGGGTATGAAGAGGGGGTTGTCTACGGTTGTACATACACCTGCACATTAGCCTAAGCCT
+
AEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE/EEE/EEEEEEEEEEEEEEEEAAEEEEEEEE/EEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEEAEEEEEAEEEEEEE/A<<AEEAEAEAEAEAAEEEA<A/
@NB501381:434:H5732BGXK:4:13602:12680:1470 BC:Z:AGTTGCCG+TTGGCATC ZA:Z:GTCT ZB:Z:CCTGT RX:Z:GTC-CCT QX:Z:AAA AAA
TCTCAGAACTGTATCTTGCCAAAGCCTTTGCCTAACGAGGCTTTGAACTCAGGAGGCTCTTTCCTGCTAAAGCAAAATTAAATGAAATCTCTCACTGATCGATCTGTATCTTGTTCTTTGTTCTTCTAGAGCCCTAGTACATTGCAA
+
AEEEEEEEEEEEEEEEEEEAEEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEEEEEEEEEEE<EEEEEEEEEEEEEEEEEEEEEA/EEEEAEEEEEAEEAEEEAEEEEEEEEEEEA6<<AAAE<A/<E6A<
Please let us know if you encounter any issues with these changes.
Hello, I also meet a similar problem when running --sam-append-comment and --no-unal at the same time. There is no output of tags.