bamtools icon indicating copy to clipboard operation
bamtools copied to clipboard

read with single base and quality = 9

Open grenaud opened this issue 11 years ago • 6 comments

I have a bam record with a single base "A" and a quality score of 9 which is a "*" in the ASCII table (33+9). However, I would expect to see a 42 but xxd -b tells me I have: 00010000 11111111 however, I would expect this: 00010000 00001001

Can this be fixed ?

grenaud avatar Aug 16 '14 18:08 grenaud

Don't forget that BAM is not "simply@ binary, but compressed as well. If you're just using cmd line tools, run it through (samtools/tabix) 'bgzip -dc' and pipe into xxd.

pezmaster31 avatar Aug 16 '14 19:08 pezmaster31

Hi Derek ! Thanks for your reply. Yes the message above is decommpressed and piped into xxd.

grenaud avatar Aug 16 '14 19:08 grenaud

Ok, just wanted to check the easy stuff first. :)

Looked a bit into this, and it's basically due to the fact that '*' is a special char in SAM, indicating 'no data' for that field.

So for compatibility's sake: when BT is writing BAM records and sees there's a single quality char that's an asterisk, it sets the qualities string to all 1's (BAM way of indicating no data).

I guess I can ignore the SAM-compatible behavior, and write the actual value, if the bases field is length 1 .

pezmaster31 avatar Aug 16 '14 20:08 pezmaster31

That would be great ! I understand when we read SAM, it is impossible to tell whether the quality field is empty but this is via C++ code and we can disentangle both cases. I noticed when another program started complaining that my other program was producing empty fields.

grenaud avatar Aug 16 '14 20:08 grenaud

Sorry for the inconvenience. Guess the idea of a 1-base alignment never occurred to me when I wrote that little workaround. :)

pezmaster31 avatar Aug 16 '14 20:08 pezmaster31

yeah we deal with ancient DNA and we can end up with really tiny sequences.

grenaud avatar Aug 16 '14 20:08 grenaud