RefineM icon indicating copy to clipboard operation
RefineM copied to clipboard

--cov_corr value

Open palomo11 opened this issue 7 years ago • 1 comments

Hi,

I'm using RefineM with my genomes and 24 samples.

In the readme, it is suggested that:

"If you have more than 6 data point (i.e. BAM files) comprising your coverage profiles you may wish to consider using the coverage correlation criteria (--cov_corr) instead of or in addition to this absolute error criteria"

Which would be a recommended value to be used? 0.95? Is it better to combine both or just use cov_corr?

In addition, could you explain a bit how exactly the cov_perc and the cov_corr are calculated?

Thank you very much in advance.

palomo11 avatar Feb 26 '18 19:02 palomo11

Hello.

I'm not sure about the best threshold to use. I haven't had a chance to play with data where this filtering is relevant. It really depends on how conservative you want to be. My gut feeling is something a bit more lenient than 0.95 though. Perhaps 0.8???

The median coverage of a bin is the median across all contigs comprising a bin.

cov_corr: Pearson's r between the median coverage of a bin in each sample vs. the coverage of each individual contig.

cov_corr: This is the mean absolute error in coverage between a contig and the median coverage of a bin taken over all samples. For each sample this is given by: abs(coverage_contig - median_coverage_bin) * 100 / median_coverage_bin.

donovan-h-parks avatar Feb 26 '18 20:02 donovan-h-parks