BS-Snper
BS-Snper copied to clipboard
CIGAR strings with multiple operations not parsed properly
I have a BAM file with reads that contain an insertion at a specific position. I noticed that the basecall counts in the --output
file produced by rrbsSnp
(and subsequently, the genotypes in the VCF file created by BS-Snper.pl
) were inaccurate around this insertion.
The CIGAR string for these reads has three operations (e.g., 143M1I7M
), but it appears that only the last operation is being processed, regardless of the number of CIGAR operations in a read. I confirmed this by checking the value of record->cigar
and record->len
at the end of sam_funcs.c:parseBuffer()
- in the example above, the values are 7M
and 7
, respectively, meaning that only 7 bases are included in the MapRecord
for this read (instead of 150), and the basecalls are recorded in the wrong reference positions.