bam/writer, sam/writer: Validate cigar/sequence/basequalities
To avoid writing corrupt or incompatible SAM/BAM files an extra validation step is applied before writing. This makes sure:
- The number of base qualities is equal to the sequence length or empty
- The length of the cigar operations matches the sequence length
Closes #59
This probably needs renaming or moving of stuff. Let me know what you think!
I forgot to edit the async writers. Will update tomorrow!
Sorry, this is a bit too out of scope of what was initially suggested in #59. I expected the same logic used in the (BAM) SAM record writer to be ported to the BAM record writer. Can you scale it back to just that?
I agree the file corruption issue is a different problem from writing sam/bam files that do not conform to the specification. Unfortunately the record writing code is duplicated 6 times. This way the validation is only written twice.
Fixing only the length/corruption issue does not magically make the incompatibility issue go away.
Apparently using this branch on real world bam files triggers errors, but not in the test suite. Will investigate, converted PR to WIP.
Thanks for looking at this in the past. I'm closing this since the alignment parsers/writers have now diverged greatly and makes the same checks.