ngmlr
ngmlr copied to clipboard
base qualities in bam file are not reversed if read maps to the reverse strand
The base-quality field in the bam file is not reversed if the read is mapped on the reverse strand. This means that the qualities does not corresponds to the right bases in the seq field. For example this fastq file:
@dadd3e96-6f0e-4bbf-9670-d320760a3654 runid=787c7726d7574aa9975fdda43dfba28e5bcfb55c sampleid=AML12246DNeasyPro read=38 ch=1037 start_time=2018-07-26T11:34:42Z
ATCATTATTACTTCATTCAGTTACGTATTGCTTTTCCTTCAAAGGTGC
+
*1($+%$(%$#&$$%&(*'(+/,,)2$&'&+289530/2(..&'$'$#
will end up in the bam file like:
dadd3e96-6f0e-4bbf-9670-d320760a3654 16 15 59557221 60 9S7M2D5M1I2M1D24M * 0 0 GCACCTTTGAAGGAAAAGCAATACGTAACTGAATGAAGTAATAATGAT *1($+%$(%$#&$$%&(*'(+/,,)2$&'&+289530/2(..&'$'$# AS:i:64 NM:i:9 XI:f:0.7857 XS:i:0 XE:i:64 XR:i:39 MD:Z:3A3^CA1A5^A0A1T1A19 SV:i:0 QS:i:9 QE:i:48 CV:f:81.25 ID:i:0 KB:f:133.646 SB:f:133.646
Hi, Its been a while since I looked at these things, but as far as I remember the base qualities do not need to be reversed in the output of the sam/ bam file. Also the sam format does not state something about that.
Thanks Fritz
I think it should be reversed. Because there is a quality value for each base in SEQ. In this case the first A has a "orginal" basequality value of * in the the fastq, but in the sam/bam output it is this last T (because it is reversed) and has a basequality of #.
Now I can't go back from the sam/bam format to the "orginal" fastq format, using PicardTools. This wil result in: ORGINAL FASTQ
@dadd3e96-6f0e-4bbf-9670-d320760a3654 runid=787c7726d7574aa9975fdda43dfba28e5bcfb55c sampleid=AML12246DNeasyPro read=38 ch=1037 start_time=2018-07-26T11:34:42Z
ATCATTATTACTTCATTCAGTTACGTATTGCTTTTCCTTCAAAGGTGC
+
*1($+%$(%$#&$$%&(*'(+/,,)2$&'&+289530/2(..&'$'$#
vs. CONVERTED FASTQ
@dadd3e96-6f0e-4bbf-9670-d320760a3654 runid=787c7726d7574aa9975fdda43dfba28e5bcfb55c sampleid=AML12246DNeasyPro read=38 ch=1037 start_time=2018-07-26T11:34:42Z
ATCATTATTACTTCATTCAGTTACGTATTGCTTTTCCTTCAAAGGTGC
#$'$'&..(2/035982+&'&$2),,/+('*(&%$$&#$%($%+$(1*
FYI: I know that minimap2, for example, does reverse the basequality string.
Hi!
Thanks for reporting this! You are right, the quality string should be reversed and ngmlr is actually doing it, unless the quality string starts with an '*' (which is quite a ridiculous bug I have to say). I just pushed a fix to the master branch for this and will put together a new release tomorrow.
Thanks, Philipp