bamUtil
bamUtil copied to clipboard
CIGAR does not evaluate to the same length as SEQ
This warning is produced by validate
even if the SEQ is set to *
. The SAM spec explicitly states that it is ok for the SEQ to be *
and that the lengths don't have to match:
If not a ‘*’, the length of the sequence must equal the sum of lengths of M/I/S/=/X operations in CIGAR
It even states that SEQ should be *
for secondary alignments:
SEQ and QUAL of secondary alignments should be set to ‘*’ to reduce the file size.
Or am I misunderstanding something?
Thanks!
Thanks for pointing this out and finding the relevant parts of the documentation. I believe that you are correct. bamUtil won't warn if the CIGAR is *
, but doesn't check SEQ's contents.
That validation is done in libStatGen, and I'm afraid that some programs using that library might make assumptions without knowing about *
. I'll test things and report back.
I had not considered secondary reads so when I put in the validation, I didn't see a situation where the cigar would have value, but there would be no sequence. You are correct, that validation does not seem to be valid, so I will try to fix that validation to only validate if SEQ is not '*'. I should be able to do this sometime this week. I'll let you know when complete.
I updated libStatGen so that the cigar/seq length validation should only occur if the seq is not '*'. Please use the most recent libStatGen and let me know if this fixes the problem or if you continue to have issues.