RefineM
RefineM copied to clipboard
--cov_corr value
Hi,
I'm using RefineM with my genomes and 24 samples.
In the readme, it is suggested that:
"If you have more than 6 data point (i.e. BAM files) comprising your coverage profiles you may wish to consider using the coverage correlation criteria (--cov_corr) instead of or in addition to this absolute error criteria"
Which would be a recommended value to be used? 0.95? Is it better to combine both or just use cov_corr?
In addition, could you explain a bit how exactly the cov_perc and the cov_corr are calculated?
Thank you very much in advance.
Hello.
I'm not sure about the best threshold to use. I haven't had a chance to play with data where this filtering is relevant. It really depends on how conservative you want to be. My gut feeling is something a bit more lenient than 0.95 though. Perhaps 0.8???
The median coverage of a bin is the median across all contigs comprising a bin.
cov_corr: Pearson's r between the median coverage of a bin in each sample vs. the coverage of each individual contig.
cov_corr: This is the mean absolute error in coverage between a contig and the median coverage of a bin taken over all samples. For each sample this is given by: abs(coverage_contig - median_coverage_bin) * 100 / median_coverage_bin.