CoverM
CoverM copied to clipboard
Gene abundance
Hi, is it possible to use CoverM to calculate the abundance of specific genes? Would it be affected by the gene length? Thanks!
Hi,
Afraid not at this stage. Can I ask what the point of that is? If you are looking at a metagenome then maybe the coverage of the entire contig is a good stand-in for the coverage of each gene, perhaps? If you are looking at a metatranscriptome then maybe given https://github.com/wwood/dirseq a try. It isn't as flexible or scalable as CoverM, but should get the job done (and give insight into potential DNA contamination).
Thanks a lot for the reply. Just thinking about if coverM can be applied for similar calculations in this paper: https://www.biorxiv.org/content/10.1101/635680v1.full.pdf. See Figure 3 or 4. Using contigs directly might not be the key to this aim. Do you think DirSeq will work out for such purpose? I will give it a try.
Hello Ben,
I want to bring this out again that calculating coverage of genes on a contig can be useful in some user cases I ran into. Essentially we just add a gff file for the genome or contigs used for the mapping step, then extract reads mapped to each gene coordinates in GFF file for coverage estimation. This is essentially an interval tree problem to count coordinates overlap for each gene position on the contig when contig is long. This is different from RNA transcript coverage because DNA reads can mapped half to the gene, half to intergenetic regions while RNA can only map to gene region. I am wondering whether this could be and add-on option for coverage. If an gff file is provided then gene coverage will be an addition output in genome and contig mode.
Thanks,
Jianshu