hts-specs icon indicating copy to clipboard operation
hts-specs copied to clipboard

Represent U in SAM rather than T for reads from RNA

Open Psy-Fer opened this issue 1 year ago • 12 comments

Before this change https://github.com/samtools/htslib/pull/1854 U was changed to N when read by samtools

Now it will be changed to T

However, I think it would be "better" if we could preserve U in SAM, even when moving SAM->BAM->CRAM->SAM for example.

There is a problem, however, that there is no room in the 4bits BAM uses to represent all 16 IUPAC bases (where T is for T and U).

A solution to this raised by @jmarshall could be to allocate a FLAG bit to indicate an alignment record is RNA, which would then mean the T coming from a BAM, would be written as a U when viewed in SAM.

This would also mean most tools would still work, while building for the future of RNA sequencing methods to represent the base that is actually being measured.

Another solution (though more ad-hoc and less "good") would be to make yet another sam tag, to denote the read is from RNA. This saves using a FLAG bit, but adds more complexity to the solution.

Cheers, James

Psy-Fer avatar Oct 31 '24 03:10 Psy-Fer