kindel icon indicating copy to clipboard operation
kindel copied to clipboard

Review handling of SAM 1.4+ CIGAR ops

Open bede opened this issue 8 years ago • 6 comments

Review handling of N, H, P, X and = ops. X and = were introduced in 1.4 and remove the need to concurrently examine the MD tag https://samtools.github.io/hts-specs/SAMv1.pdf

bede avatar Jul 23 '17 14:07 bede

Is this something that should be reviewed in mdshw5/simplesam as well?

mdshw5 avatar Aug 21 '17 19:08 mdshw5

Hi @mdshw5 : ) Does Simplesam have operation-specific functionality? Or does it simply parse CIGAR ops as single char strings at face value? If the latter, it should be fine. BBMap is the only popular aligner implementing the 1.4 spec at this time AFAIK. So long as it handles X and = ops it'll be fine.

bede avatar Aug 22 '17 08:08 bede

No, simplesam only uses the CIGAR operations to inspect the alignment of the sequence, but does not interpret the match/mismatch operators. It seems like the SAM1.4 CIGAR operations don't completely negate the MD tag, only when you don't care about the exact sequence of the reference.

Anyway, I cut a new release (0.1.2) which should be on PyPI in a few minutes.

mdshw5 avatar Aug 22 '17 18:08 mdshw5

Sorry for not being clearer – yes it only means that the MD is redundant for some applications.

bede avatar Aug 22 '17 18:08 bede

You got me excited, since it really bothers me that we have two strings (CIGAR and MD) that basically represent the same information in a slightly different way.

mdshw5 avatar Aug 22 '17 18:08 mdshw5

Ah yes. At least we can now count indels and substitutions without the MD this way. Thanks again for making Simplesam.

bede avatar Aug 22 '17 18:08 bede