scanpy icon indicating copy to clipboard operation
scanpy copied to clipboard

rank_genes_groups logfoldchange different than seurat

Open nagaBoulevard opened this issue 4 years ago • 1 comments

Hi! I've noticed that the function rank_genes_groups calculate logFC differently than seurat. https://github.com/theislab/scanpy/blob/5800db45dde0c2060ad0a9bed8ea931bd41da936/scanpy/tools/_rank_genes_groups.py#L207-L208

https://github.com/theislab/scanpy/blob/5800db45dde0c2060ad0a9bed8ea931bd41da936/scanpy/tools/_rank_genes_groups.py#L223

Thus the equation is log(exp(mean(values))

while in Seurat is https://github.com/satijalab/seurat/blob/96d07d80bc4b6513b93e9c10d8a9d57ae7016f9f/R/differential_expression.R#L175-L179

thus log(mean(exp(values)))

I was thus wondering if this was intended, since it leads to different logFC values.

nagaBoulevard avatar Oct 07 '19 21:10 nagaBoulevard

I found this problem too. Now logFC is still calculated in this way, that I am not satisfied with. When we are talking about average fold change of gene expression, the fold change of non-loged average expression is expected. In this way people get an intuitive feeling about how many times a gene is expressed compared with another group. So the expm() step must be done before the mean() step. Swap this order not only changes the logFC vaule, but also loses the biological meaning and doesn't make any sense.

Superoxide-Dismutase avatar Apr 26 '22 07:04 Superoxide-Dismutase