CoverM icon indicating copy to clipboard operation
CoverM copied to clipboard

Calculations of min-read-percent-identity and min-read-aligned-length

Open Rridley7 opened this issue 1 year ago • 2 comments

Hello, I wanted to ask what methods and calculations are being used to calculate percent identity and alignment length percentage when filtering? Is this related to the NM tag, or parsing of the cigar string? Thanks!

Rridley7 avatar Mar 19 '23 19:03 Rridley7

Hi,

Sure, it is just based on NM tags plus lengths of the read etc. The cigar string cannot be reliably used for %ID because mapping software often does not distinguish between a match and a mismatch - both are encoded as 'M'. ben

wwood avatar Mar 20 '23 00:03 wwood

I see. So if I'm understanding correctly, % identity is calculated as (length of aligned region - NM) / (length of aligned region)? Is the cigar used for this length of this aligned region, the start and end reference positions, or something else?

Following this, the min-read-aligned-length is the difference of this alignment length - original read length / original read length?

Rridley7 avatar Mar 20 '23 14:03 Rridley7