ngmlr icon indicating copy to clipboard operation
ngmlr copied to clipboard

base qualities in bam file are not reversed if read maps to the reverse strand

Open mroosmalen opened this issue 5 years ago • 3 comments

The base-quality field in the bam file is not reversed if the read is mapped on the reverse strand. This means that the qualities does not corresponds to the right bases in the seq field. For example this fastq file:

@dadd3e96-6f0e-4bbf-9670-d320760a3654 runid=787c7726d7574aa9975fdda43dfba28e5bcfb55c sampleid=AML12246DNeasyPro read=38 ch=1037 start_time=2018-07-26T11:34:42Z
ATCATTATTACTTCATTCAGTTACGTATTGCTTTTCCTTCAAAGGTGC
+
*1($+%$(%$#&$$%&(*'(+/,,)2$&'&+289530/2(..&'$'$#

will end up in the bam file like:

dadd3e96-6f0e-4bbf-9670-d320760a3654	16	15	59557221	60	9S7M2D5M1I2M1D24M	*	0	0	GCACCTTTGAAGGAAAAGCAATACGTAACTGAATGAAGTAATAATGAT	*1($+%$(%$#&$$%&(*'(+/,,)2$&'&+289530/2(..&'$'$#	AS:i:64	NM:i:9	XI:f:0.7857	XS:i:0	XE:i:64	XR:i:39	MD:Z:3A3^CA1A5^A0A1T1A19	SV:i:0	QS:i:9	QE:i:48	CV:f:81.25	ID:i:0	KB:f:133.646	SB:f:133.646

mroosmalen avatar Aug 20 '18 10:08 mroosmalen

Hi, Its been a while since I looked at these things, but as far as I remember the base qualities do not need to be reversed in the output of the sam/ bam file. Also the sam format does not state something about that.

Thanks Fritz

fritzsedlazeck avatar Aug 29 '18 14:08 fritzsedlazeck

I think it should be reversed. Because there is a quality value for each base in SEQ. In this case the first A has a "orginal" basequality value of * in the the fastq, but in the sam/bam output it is this last T (because it is reversed) and has a basequality of #.

Now I can't go back from the sam/bam format to the "orginal" fastq format, using PicardTools. This wil result in: ORGINAL FASTQ

@dadd3e96-6f0e-4bbf-9670-d320760a3654 runid=787c7726d7574aa9975fdda43dfba28e5bcfb55c sampleid=AML12246DNeasyPro read=38 ch=1037 start_time=2018-07-26T11:34:42Z
ATCATTATTACTTCATTCAGTTACGTATTGCTTTTCCTTCAAAGGTGC
+
*1($+%$(%$#&$$%&(*'(+/,,)2$&'&+289530/2(..&'$'$#

vs. CONVERTED FASTQ

@dadd3e96-6f0e-4bbf-9670-d320760a3654 runid=787c7726d7574aa9975fdda43dfba28e5bcfb55c sampleid=AML12246DNeasyPro read=38 ch=1037 start_time=2018-07-26T11:34:42Z
ATCATTATTACTTCATTCAGTTACGTATTGCTTTTCCTTCAAAGGTGC
#$'$'&..(2/035982+&'&$2),,/+('*(&%$$&#$%($%+$(1*

FYI: I know that minimap2, for example, does reverse the basequality string.

mroosmalen avatar Aug 30 '18 05:08 mroosmalen

Hi!

Thanks for reporting this! You are right, the quality string should be reversed and ngmlr is actually doing it, unless the quality string starts with an '*' (which is quite a ridiculous bug I have to say). I just pushed a fix to the master branch for this and will put together a new release tomorrow.

Thanks, Philipp

philres avatar Sep 03 '18 21:09 philres