CoverM icon indicating copy to clipboard operation
CoverM copied to clipboard

Use contig length for calculations, or contig length minus 150?

Open jeffkimbrel opened this issue 2 years ago • 0 comments

Hi Ben, I just wanted to get your thoughts on whether you think the other three metrics that use contig length should be doing what the mean calculation is. Specifically, subtracting what is in the --contig-end-exclusion argument (2x). The rationale is since you aren't mapping to the contig ends, they shouldn't be in the calculations, and some methods like covered_fraction could never reach 100% because you will always come up 150nt short compared to the full length.

This would affect the covered_fraction, reads_per_baseand rpkm methods (with tpm also being affected via rpkm). By subtracting out the --contig-end-exclusion lengths from those calculations, it would bump up the alignment counts with a bigger increase for smaller contigs. Based on some tests with covered_fraction, this does subtly change the rankings of contigs.

Let me know what you think.

jeffkimbrel avatar Jan 26 '22 02:01 jeffkimbrel