presto icon indicating copy to clipboard operation
presto copied to clipboard

LogFC Clarification

Open christopher-hardy opened this issue 4 years ago • 4 comments

Hi,

Thank you for your work- it is really remarkable.

Not sure if this is a bug or a feature, but when doing a comparison between presto::wilcoxauc and Seurat::FindMarkers I noticed a difference in the LogFC calculation. When calculating the LogFC from a Seurat Data Object, the values are untransformed prior to calculating the mean, then re-log-transformed on the mean values- plus a pseudo count of default 1 to avoid Inf LogFCs (Line 551 @ https://github.com/satijalab/seurat/blob/master/R/differential_expression.R). This difference causes changes in the avgExpr and logFC when compared to the Seurat output. I was wondering if you could clarify whether you believe your approach is correct, or if this a potential issue with an edge case when the data are previously log transformed?

Note: If helpful I am able to replicate the values in Seurat with a couple slight modifications (see lines 1, 7 and 11 below).

  X <- expm1(X)
  group_sums <- sumGroups(X, y, 1)
  group_nnz <- nnzeroGroups(X, y, 1)
  group_pct <- sweep(group_nnz, 1, as.numeric(table(y)), "/") %>% t()
  group_pct_out <- -group_nnz %>% sweep(2, colSums(group_nnz), 
                                        "+") %>% sweep(1, as.numeric(length(y) - table(y)), "/") %>% t()
  group_means <- log(sweep(group_sums, 1, as.numeric(table(y)), "/") + 1) %>% t()
  cs <- colSums(group_sums)
  gs <- as.numeric(table(y))
  lfc <- Reduce(cbind, lapply(seq_len(length(levels(y))), function(g) {
    group_means[, g] - (log((cs - group_sums[g, ]) / (length(y) - gs[g]) + 1))
  }))

Thanks!

Chris

christopher-hardy avatar May 12 '20 17:05 christopher-hardy