bowtie2 icon indicating copy to clipboard operation
bowtie2 copied to clipboard

feature request: functionality like "bwa mem -C" to move fastq comments to sam tags

Open eboyden opened this issue 4 years ago • 7 comments

Hi, it would be great if Bowtie 2 could move fastq read header comments to sam tags, similar to the "-C" function of BWA MEM. Although Bowtie 2 supports alignment of ubams (unlike BWA), which would preserve tags that already exist, unfortunately I must use fastqs as Bowtie input due to necessary upstream processing tools, which means moving tags I need to preserve to fastq header comments using samtools fastq -T.

It looks like others are also interested in this functionality: https://www.biostars.org/p/405778/

Thanks!

eboyden avatar Apr 06 '20 01:04 eboyden

This one did not require much effort so I went ahead and put together a quick implementation of what I think you're asking for. Feedback is appreciated.

Input

==> example/reads/reads_1.fq <==
@r1 BC:Z:CGTAC
TGAATGCGAACTCCGGGACGCTCAGTAATGTGACGATAGCTGAAAACTGTACGATAAACNGTACGCTGAGGGCAGAAAAAATCGTCGGGGACATTNTAAAGGCGGCGAGCGCGGCTTTTCCG
+
+"@6<:27(F&5)9)"B:%B+A-%5A?2$HCB0B+0=D<7E/<.03#!.F77@6B==?C"7>;))%;,3-$.A06+<-1/@@?,26">=?*@'0;$:;??G+:#+(A?9+10!8!?()?7C>

==> example/reads/reads_2.fq <==
@r1 BC:Z:CGTAG
NAACGTCAGCGGAAGCACCACTATCTGGCGATCAAAAGGATGGTCATCGGTCACGGTGACAGTACGGGTACCTGACGGCCAGTCCACACTGCTTTCACGCTGGCGCGGNAAAGCCGCGCTCGCCGCCTNTACNATGTCCCCGACGATTTTTTCCGCCCTCAGCGTACCGTTTATCGTACAGTTTTCAGCTATCGTCAC
+
&?$F&9AC8H49@5<*>=E;2/G+;9?&8D.;E!2:0B/)++#F-><E,-/&F:&/#B=*'C?@$0!+#1?.ABC?:-?9GA?,11.2>3$&@-'6;!6@D?&#*E:C#9-4&8?"))H>H9-->((E99;:@::1=1-!+$@$8+#@%(<B,+%6!>@0G*D5A'(>9AC<;@01:7@:@0D&D=@:>7"1)D+9+B

Command line: ./bowtie2-align-s -x example/index/lambda_virus -1 example/reads/reads_1.fq -2 example/reads/reads_2.fq -u1 --sam-append-comment

Output:

1 reads; of these:
  1 (100.00%) were paired; of these:
    0 (0.00%) aligned concordantly 0 times
    1 (100.00%) aligned concordantly exactly 1 time
    0 (0.00%) aligned concordantly >1 times
    ----
    0 pairs aligned concordantly 0 times; of these:
      0 (0.00%) aligned discordantly 1 time
    ----
    0 pairs aligned 0 times concordantly or discordantly; of these:
      0 mates make up the pairs; of these:
        0 (0.00%) aligned 0 times
        0 (0.00%) aligned exactly 1 time
        0 (0.00%) aligned >1 times
100.00% overall alignment rate
@HD	VN:1.0	SO:unsorted
@SQ	SN:gi|9626243|ref|NC_001416.1|	LN:48502
@PG	ID:bowtie2	PN:bowtie2	VN:2.4.1	CL:"/home/rcharles/Development/git/bowtie2/bowtie2-align-s -x example/index/lambda_virus -1 example/reads/reads_1.fq -2 example/reads/reads_2.fq -u1 --sam-append-comment"
r1	99	gi|9626243|ref|NC_001416.1|	18401	42	122M	=	18430	227	TGAATGCGAACTCCGGGACGCTCAGTAATGTGACGATAGCTGAAAACTGTACGATAAACNGTACGCTGAGGGCAGAAAAAATCGTCGGGGACATTNTAAAGGCGGCGAGCGCGGCTTTTCCG	+"@6<:27(F&5)9)"B:%B+A-%5A?2$HCB0B+0=D<7E/<.03#!.F77@6B==?C"7>;))%;,3-$.A06+<-1/@@?,26">=?*@'0;$:;??G+:#+(A?9+10!8!?()?7C>	AS:i:-5	XN:i:0	XM:i:3	XO:i:0	XG:i:0	NM:i:3	MD:Z:59G13G21G26	YS:i:-4	YT:Z:CP	BC:Z:CGTAC
r1	147	gi|9626243|ref|NC_001416.1|	18430	42	198M	=	18401	-227	GTGACGATAGCTGAAAACTGTACGATAAACGGTACGCTGAGGGCGGAAAAAATCGTCGGGGACATNGTANAGGCGGCGAGCGCGGCTTTNCCGCGCCAGCGTGAAAGCAGTGTGGACTGGCCGTCAGGTACCCGTACTGTCACCGTGACCGATGACCATCCTTTTGATCGCCAGATAGTGGTGCTTCCGCTGACGTTN	B+9+D)1"7>:@=D&D0@:@7:10@;<CA9>('A5D*G0@>!6%+,B<(%@#+8$@$+!-1=1::@:;99E((>--9H>H))"?8&4-9#C:E*#&?D@6!;6'-@&$3>2.11,?AG9?-:?CBA.?1#+!0$@?C'*=B#/&:F&/-,E<>-F#++)/B0:2!E;.D8&?9;+G/2;E=>*<5@94H8CA9&F$?&	AS:i:-4	XN:i:0	XM:i:4	XO:i:0	XG:i:0	NM:i:4	MD:Z:65T3A19T107T0	YS:i:-5	YT:Z:CP	BC:Z:CGTAG

ch4rr0 avatar Sep 01 '20 23:09 ch4rr0

This looks like it does exactly what I had in mind, thanks! Can it handle multiple comments/tags?

eboyden avatar Sep 03 '20 22:09 eboyden

It should, as it considers everything after the first space in the read name to be a comment. It’s up to the user to make sure that the comment is formatted correctly in order to produce output compliant with the SAM spec.

ch4rr0 avatar Sep 03 '20 23:09 ch4rr0

We have included this feature in our most recent release, v2.4.2. Thank you for putting in the request.

ch4rr0 avatar Oct 06 '20 04:10 ch4rr0

I am trying this new feature, but I am hitting a problem when running --un with --sam-append-comment. When you specify to output unmapped reads to a new fastq, the resulting sam file does not have appended comments.

@PG	ID:bowtie2	PN:bowtie2	VN:2.4.4	CL:"/usr/local/bin/bowtie2-align-s --wrapper basic-0 --rg-id BMG --rg SM:634929898631_Cut_0 --very-sensitive -L 30 --score-min L,-0.6,-0.2 --end-to-end --reorder --sam-append-comment -p 12 -x Bowtie2Index//genome -U read.fastq.gz"
NB501381:434:H5732BGXK:4:23606:22621:3148	16	chr8	124684412	42	147M	*	0	0	AAAAAGCAAAATGAGCCTCCAGTTGGGTTGTGGCTGGCCATGCCCCCCACACGGCACAGCACTTACCTAATGTGAGGCTGCAGGCCGGGCTGCTCTTCATAGCCGTCTATGAGGAAAGGCAGGTTCTTGGCCCAGTGGTCCTTGAAG	/EEAEEAEEA</<A<<//AAEAE</AE/EA<E<EA/AA/EEEEAEE<E/AEE<6EAEE<EE/AE<EAEEAEEAEEE<EAEE/AEA<AA<AEAEEEEE/A/A<EAE/EAE<EEEEEEEEEEEEE/EEE/EEEEEEEEAEEE6EEEEEA	AS:i:-3	XN:i:0	XM:i:1	XO:i:0	XG:i:0	NM:i:1	MD:Z:0C146	YT:Z:UU	RG:Z:BMG	BC:Z:AGTTGCCG+TTGGCATC	ZA:Z:TGGT	ZB:Z:AAGT	RX:Z:TGG-AAG	QX:Z:AAA AA/
NB501381:434:H5732BGXK:2:11302:5169:4814	0	chr7	22149254	1	147M	*	0	0	TACACAAACAGGACCCAACATTCTGCTGCAAACAGGAAACACATCTCAAGGAAAAAGACAGACATTACCTCAGAGTGAAAGGCTGGAAAACAATTTTCCAAGCAAATGGTCTGAAGAAACAAGCTGGAGTAGCCATTCTAATATCAC	AEEEEEE//EEEE/<//EEEAAAAE/EEA//E<E/AEAEEE//AE6EEE//EEEEE/AAE<EAEA/E<EE<AEAEE/EEE<//AEAEE/EEEEAAEAEAEE/EAEEE6EEAEE///<E/<AA/AAE/A/E<<<<6AAEE/EAEEAA/	AS:i:-9	XS:i:-9	XN:i:0	XM:i:3	XO:i:0	XG:i:0	NM:i:3	MD:Z:29T0T115G0	YT:Z:UU	RG:Z:BMG	BC:Z:AGTTGCCG+TTGGCATC	ZA:Z:CTGT	ZB:Z:GAACT	RX:Z:CTG-GAA	QX:Z:/AA /AA
NB501381:434:H5732BGXK:3:21504:13617:9739	4	*	0	0	*	*	0	0	GATCTCACTTTGTAGCCTTGGCTAGCTTCAAATTCATGATCGATCCAGAAGAGAGAAATGATGGCAAGAACTTTTATCTGGGGAGGTGGGGGTATGAAGAGGGGGTTGTCTACGGTTGTACATACACCTGCACATTAGCCTAAGCCT	AEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE/EEE/EEEEEEEEEEEEEEEEAAEEEEEEEE/EEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEEAEEEEEAEEEEEEE/A<<AEEAEAEAEAEAAEEEA<A/	YT:Z:UU	RG:Z:BMG	BC:Z:AGTTGCCG+TTGGCATC	ZA:Z:CTGT	ZB:Z:GAGT	RX:Z:CTG-GAG	QX:Z:AAA AAA
NB501381:434:H5732BGXK:4:13602:12680:1470	4	*	0	0	*	*	0	0	TCTCAGAACTGTATCTTGCCAAAGCCTTTGCCTAACGAGGCTTTGAACTCAGGAGGCTCTTTCCTGCTAAAGCAAAATTAAATGAAATCTCTCACTGATCGATCTGTATCTTGTTCTTTGTTCTTCTAGAGCCCTAGTACATTGCAA	AEEEEEEEEEEEEEEEEEEAEEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEEEEEEEEEEE<EEEEEEEEEEEEEEEEEEEEEA/EEEEAEEEEEAEEAEEEAEEEEEEEEEEEA6<<AAAE<A/<E6A<	YT:Z:UU	RG:Z:BMG	BC:Z:AGTTGCCG+TTGGCATC	ZA:Z:GTCT	ZB:Z:CCTGT	RX:Z:GTC-CCT	QX:Z:AAA AAA
@PG	ID:bowtie2	PN:bowtie2	VN:2.4.4	CL:"/usr/local/bin/bowtie2-align-s --wrapper basic-0 --rg-id BMG --rg SM:634929898631_Cut_0 --very-sensitive -L 30 --score-min L,-0.6,-0.2 --end-to-end --reorder --sam-append-comment -p 12 -x Bowtie2Index//genome --passthrough -U read.fastq.gz"
NB501381:434:H5732BGXK:4:23606:22621:3148	16	chr8	124684412	42	147M	*	0	0	AAAAAGCAAAATGAGCCTCCAGTTGGGTTGTGGCTGGCCATGCCCCCCACACGGCACAGCACTTACCTAATGTGAGGCTGCAGGCCGGGCTGCTCTTCATAGCCGTCTATGAGGAAAGGCAGGTTCTTGGCCCAGTGGTCCTTGAAG	/EEAEEAEEA</<A<<//AAEAE</AE/EA<E<EA/AA/EEEEAEE<E/AEE<6EAEE<EE/AE<EAEEAEEAEEE<EAEE/AEA<AA<AEAEEEEE/A/A<EAE/EAE<EEEEEEEEEEEEE/EEE/EEEEEEEEAEEE6EEEEEA	AS:i:-3	XN:i:0	XM:i:1	XO:i:0	XG:i:0	NM:i:1	MD:Z:0C146	YT:Z:UU	RG:Z:BMG
NB501381:434:H5732BGXK:2:11302:5169:4814	0	chr7	22149254	1	147M	*	0	0	TACACAAACAGGACCCAACATTCTGCTGCAAACAGGAAACACATCTCAAGGAAAAAGACAGACATTACCTCAGAGTGAAAGGCTGGAAAACAATTTTCCAAGCAAATGGTCTGAAGAAACAAGCTGGAGTAGCCATTCTAATATCAC	AEEEEEE//EEEE/<//EEEAAAAE/EEA//E<E/AEAEEE//AE6EEE//EEEEE/AAE<EAEA/E<EE<AEAEE/EEE<//AEAEE/EEEEAAEAEAEE/EAEEE6EEAEE///<E/<AA/AAE/A/E<<<<6AAEE/EAEEAA/	AS:i:-9	XS:i:-9	XN:i:0	XM:i:3	XO:i:0	XG:i:0	NM:i:3	MD:Z:29T0T115G0	YT:Z:UU	RG:Z:BMG
NB501381:434:H5732BGXK:3:21504:13617:9739	4	*	0	0	*	*	0	0	GATCTCACTTTGTAGCCTTGGCTAGCTTCAAATTCATGATCGATCCAGAAGAGAGAAATGATGGCAAGAACTTTTATCTGGGGAGGTGGGGGTATGAAGAGGGGGTTGTCTACGGTTGTACATACACCTGCACATTAGCCTAAGCCT	AEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE/EEE/EEEEEEEEEEEEEEEEAAEEEEEEEE/EEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEEAEEEEEAEEEEEEE/A<<AEEAEAEAEAEAAEEEA<A/	YT:Z:UU	RG:Z:BMG
NB501381:434:H5732BGXK:4:13602:12680:1470	4	*	0	0	*	*	0	0	TCTCAGAACTGTATCTTGCCAAAGCCTTTGCCTAACGAGGCTTTGAACTCAGGAGGCTCTTTCCTGCTAAAGCAAAATTAAATGAAATCTCTCACTGATCGATCTGTATCTTGTTCTTTGTTCTTCTAGAGCCCTAGTACATTGCAA	AEEEEEEEEEEEEEEEEEEAEEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEEEEEEEEEEE<EEEEEEEEEEEEEEEEEEEEEA/EEEEAEEEEEAEEAEEEAEEEEEEEEEEEA6<<AAAE<A/<E6A<	YT:Z:UU	RG:Z:BMG
cat temp_trial.fastq
@NB501381:434:H5732BGXK:3:21504:13617:9739	BC:Z:AGTTGCCG+TTGGCATC	ZA:Z:CTGT	ZB:Z:GAGT	RX:Z:CTG-GAG	QX:Z:AAA AAA
GATCTCACTTTGTAGCCTTGGCTAGCTTCAAATTCATGATCGATCCAGAAGAGAGAAATGATGGCAAGAACTTTTATCTGGGGAGGTGGGGGTATGAAGAGGGGGTTGTCTACGGTTGTACATACACCTGCACATTAGCCTAAGCCT
+
AEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE/EEE/EEEEEEEEEEEEEEEEAAEEEEEEEE/EEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEEAEEEEEAEEEEEEE/A<<AEEAEAEAEAEAAEEEA<A/
	BC:Z:AGTTGCCG+TTGGCATC	ZA:Z:CTGT	ZB:Z:GAGT	RX:Z:CTG-GAG	QX:Z:AAA AAA@NB501381:434:H5732BGXK:4:13602:12680:1470	BC:Z:AGTTGCCG+TTGGCATC	ZA:Z:GTCT	ZB:Z:CCTGT	RX:Z:GTC-CCT	QX:Z:AAA AAA
TCTCAGAACTGTATCTTGCCAAAGCCTTTGCCTAACGAGGCTTTGAACTCAGGAGGCTCTTTCCTGCTAAAGCAAAATTAAATGAAATCTCTCACTGATCGATCTGTATCTTGTTCTTTGTTCTTCTAGAGCCCTAGTACATTGCAA
+
AEEEEEEEEEEEEEEEEEEAEEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEEEEEEEEEEE<EEEEEEEEEEEEEEEEEEEEEA/EEEEAEEEEEAEEAEEEAEEEEEEEEEEEA6<<AAAE<A/<E6A<
	BC:Z:AGTTGCCG+TTGGCATC	ZA:Z:GTCT	ZB:Z:CCTGT	RX:Z:GTC-CCT	QX:Z:AAA AAA

MikeWLloyd avatar Oct 25 '21 16:10 MikeWLloyd

Hello, I pushed a change to bug_fixes that should resolve this issue. Here's my sample run:

$ ./bowtie2 --debug -x example/index/lambda_virus test.fq --sam-append-comment --un foo.fq --sam-nohead
Warning: Running in debug mode.  Please use debug mode only for diagnosing errors, and not for typical use of Bowtie 2.
2 reads; of these:
  2 (100.00%) were unpaired; of these:
    2 (100.00%) aligned 0 times
    0 (0.00%) aligned exactly 1 time
    0 (0.00%) aligned >1 times
0.00% overall alignment rate
NB501381:434:H5732BGXK:3:21504:13617:9739	4	*	0	0	*	*	0	0	GATCTCACTTTGTAGCCTTGGCTAGCTTCAAATTCATGATCGATCCAGAAGAGAGAAATGATGGCAAGAACTTTTATCTGGGGAGGTGGGGGTATGAAGAGGGGGTTGTCTACGGTTGTACATACACCTGCACATTAGCCTAAGCCT	AEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE/EEE/EEEEEEEEEEEEEEEEAAEEEEEEEE/EEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEEAEEEEEAEEEEEEE/A<<AEEAEAEAEAEAAEEEA<A/	YT:Z:UU	BC:Z:AGTTGCCG+TTGGCATC	ZA:Z:CTGT	ZB:Z:GAGT	RX:Z:CTG-GAG	QX:Z:AAA AAA
NB501381:434:H5732BGXK:4:13602:12680:1470	4	*	0	0	*	*	0	0	TCTCAGAACTGTATCTTGCCAAAGCCTTTGCCTAACGAGGCTTTGAACTCAGGAGGCTCTTTCCTGCTAAAGCAAAATTAAATGAAATCTCTCACTGATCGATCTGTATCTTGTTCTTTGTTCTTCTAGAGCCCTAGTACATTGCAA	AEEEEEEEEEEEEEEEEEEAEEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEEEEEEEEEEE<EEEEEEEEEEEEEEEEEEEEEA/EEEEAEEEEEAEEAEEEAEEEEEEEEEEEA6<<AAAE<A/<E6A<	YT:Z:UU	BC:Z:AGTTGCCG+TTGGCATC	ZA:Z:GTCT	ZB:Z:CCTGT	RX:Z:GTC-CCT	QX:Z:AAA AAA

$ cat foo.fq
@NB501381:434:H5732BGXK:3:21504:13617:9739	BC:Z:AGTTGCCG+TTGGCATC	ZA:Z:CTGT	ZB:Z:GAGT	RX:Z:CTG-GAG	QX:Z:AAA AAA
GATCTCACTTTGTAGCCTTGGCTAGCTTCAAATTCATGATCGATCCAGAAGAGAGAAATGATGGCAAGAACTTTTATCTGGGGAGGTGGGGGTATGAAGAGGGGGTTGTCTACGGTTGTACATACACCTGCACATTAGCCTAAGCCT
+
AEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE/EEE/EEEEEEEEEEEEEEEEAAEEEEEEEE/EEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEEAEEEEEAEEEEEEE/A<<AEEAEAEAEAEAAEEEA<A/
@NB501381:434:H5732BGXK:4:13602:12680:1470	BC:Z:AGTTGCCG+TTGGCATC	ZA:Z:GTCT	ZB:Z:CCTGT	RX:Z:GTC-CCT	QX:Z:AAA AAA
TCTCAGAACTGTATCTTGCCAAAGCCTTTGCCTAACGAGGCTTTGAACTCAGGAGGCTCTTTCCTGCTAAAGCAAAATTAAATGAAATCTCTCACTGATCGATCTGTATCTTGTTCTTTGTTCTTCTAGAGCCCTAGTACATTGCAA
+
AEEEEEEEEEEEEEEEEEEAEEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEEEEEEEEEEE<EEEEEEEEEEEEEEEEEEEEEA/EEEEAEEEEEAEEAEEEAEEEEEEEEEEEA6<<AAAE<A/<E6A<

Please let us know if you encounter any issues with these changes.

ch4rr0 avatar Oct 25 '21 20:10 ch4rr0

Hello, I also meet a similar problem when running --sam-append-comment and --no-unal at the same time. There is no output of tags.

Biowood avatar Nov 18 '21 16:11 Biowood