salmon icon indicating copy to clipboard operation
salmon copied to clipboard

log(CPM) and TPM are so different

Open biozzq opened this issue 1 year ago β€’ 5 comments

Dear all,

I would say that salmon is so fast to report the TPM and read counts for each transcript or gene, and I always use salmon+tximport+edgeR to detect the differentially expressed genes. Because the edgeR can output the normalized read counts and tximport can output TPM for each gene based on the results generated by salmon, I asked the difference between TPM and log2(CPM). From following correlation plot, I found that the samples are clustered by different quantifications, TPM and CPM, but not by samples. Because my RNA-seq experiment contains 7 biological replicates in each of two conditions, I decide to identify differentially expressed genes by using Wilcoxon rank-sum test based on each gene’s TPM or CPM. Also, I can also retain the overlapped differentially expressed genes between edgeR and Wilcoxon rank-sum test. I would like to hear your suggestion.

y <- DGEList(counts=data, group=group, genes=genelength) # the genelength is generated by salmon+tximport for each sample 
keep <- filterByExpr(y)
y <- y[keep,,keep.lib.sizes=FALSE]
y <- calcNormFactors(y)
logcpm <- cpm(y, log=TRUE, prior.count=1)

tpm_cpm_corr-spearman.pdf

Thank you in advance.

Best regards, Zheng zhuqing

biozzq avatar Nov 22 '22 14:11 biozzq