goleft
goleft copied to clipboard
covmed overestimates coverage?
I've noticed that covmed estimates higher median coverage than other tools. For example for a particular whole genome covmed estimates 33.4, while Picard CollectWgsMetrics estimates 27. I've performed similar calculations on exomes where I get median coverage of 199.71 with covmed (using the region argument) compared with 189 using bedtools (take the median of counts per base over target region). I've found consistently higher results from covmed compared with picard and bedtools across a number of exomes and genomes. The size of the difference is variable.
I wonder if you have any idea why this is occurring?
One possibility that springs to mind for exomes in particular is that reads outside the target region could be counted and so cause it to overestimate the coverage.
yeah, I've noticed this as well. I'll have a look today. Picard and bedtools are doing actual coverage calculations across the whole bam (I'm pretty sure, anyway) while covmed is estimating based on a sample, but it still should be able to have a pretty good estimate.
@hdashnow would you give one of the attached binaries a try (I have to gzip to attach here so you'll have to unzip and chmod +x). This should give a more accurate estimate, but I'd like to see how it performs for your cases.
You can now do : goleft covmed *.bam
so it's easier to run on a group of bams.
goleft_osx.gz
goleft_linux64.gz
and a caveat is that goleft is likely to be inaccurate for exome or targetted, but I'll improve that a bit more in the future.
Good idea adding that filter. It made the estimates slightly smaller. e.g. 33.04 instead of 33.4. Still nowhere near Picard.