propr icon indicating copy to clipboard operation
propr copied to clipboard

Question about the propr `select` argument

Open JemmaSun opened this issue 3 years ago • 3 comments

Hi! I am trying to use propr for a co-occurrence analysis. I noticed that the log-ratio values calculated by propr (data @logratio in the output) sometimes are quite different with that calculated by log(x[i]/exp(mean(log(x)))) # where x is a list of counts, as described in your paper. Could you please tell me which R function did you use to calculate the logratio? If I misunderstand about the formula, please let me know. Thank you in advance!

JemmaSun avatar Oct 30 '21 04:10 JemmaSun

Hi Jemma, thanks for your interest in propr. For a data set where rows are samples and columns are features, the CLR should be performed row-wise. So in your case do you calculate log(x[i]/exp(mean(log(x)))) where x is a sample? They seem to match for me :-)

dat <- matrix(runif(30),5,6)
library(propr)
pr <- propr(dat)
A <- pr@logratio

B <- t(apply(dat, 1, function(x) log(x / exp(mean(log(x))))))
A
B

tpq avatar Oct 30 '21 05:10 tpq

Hi tpq,

Thanks for your quick reply. Yes, the x I meant was a list of counts for a certain sample. My current dataset has 12 samples/rows and over 10,000 OTUs/columns.

I have been using propr on smaller datasets before, and the log-ratio values given by propr are exactly the same as what were given by log(x/exp(mean(log(x)))). However, I got totally different logratios this time with my current dataset. I attached one of my samples and its counts (they are not true counts, more like (but not real) relative abundances) in file "SRR_counts.csv". Using this sample as an example, the geometric mean is 0.06017464, and the logratio for the first sample "Root; d__Archaea" should be log(336.5/0.06017464) = 8.629102, whereas the logratio value calculated by propr for this sample was 11.56008.

Wait... I guess I found the reason. Actually before I run perb(), I applied some selection to keep only the columns that are abundant in at least 2 samples. Then when I run perb(data, select=keep), the log-ratio values are calculated based on the untrimmed data set, whereas I was calculating geometric mean with the trimmed data. I think I've figured it out :-D Sorry for the trouble.

SRR_counts.csv

JemmaSun avatar Oct 30 '21 08:10 JemmaSun

Yes, that'd be it!

tpq avatar Nov 22 '21 00:11 tpq