scvi-tools
scvi-tools copied to clipboard
Differential expression decision function
Hi,
I was investigating a case where the mean fold change was very small, but the proba_de
was relatively large.
The current decision function to consider a posterior sample to support a DE hypothesis is np.abs(lfc_sample) >= delta
. If the posterior distribution of lfc
is wide and centered around 0 this assigns a large number of samples as DE. I see how this makes sense: if the posterior is uncertain there's a large chance the gene was actually DE, but we can't say in which direction.
In this example the mean of the fold changes is very small, but proba_de
is 0.7.
I think it might make more sense to test two different hypotheses for each gene: lfc_sample <= -delta
, -delta < lfc_sample < delta
, and delta < lfc_sample
. Samples then fall in three different classes, two of which are interesting DE cases:
Now the hypothesis "Is negative DE" has probability 0.26, and the hypothesis "Is positive DE" has probability 0.45.
To simplify results I think the output could return max(is_negative_de, is_positive_de)
as proba_de
.
What do you think?
/Valentine
Hi Valentine, this is a really useful feature to filter out pathological genes. I am reordering the codebase related to DE to prepare for upcoming changes and included your feature here: #1360