scvi-tools icon indicating copy to clipboard operation
scvi-tools copied to clipboard

Differential expression decision function

Open vals opened this issue 3 years ago • 1 comments

Hi,

I was investigating a case where the mean fold change was very small, but the proba_de was relatively large.

The current decision function to consider a posterior sample to support a DE hypothesis is np.abs(lfc_sample) >= delta. If the posterior distribution of lfc is wide and centered around 0 this assigns a large number of samples as DE. I see how this makes sense: if the posterior is uncertain there's a large chance the gene was actually DE, but we can't say in which direction.

In this example the mean of the fold changes is very small, but proba_de is 0.7. image

I think it might make more sense to test two different hypotheses for each gene: lfc_sample <= -delta, -delta < lfc_sample < delta, and delta < lfc_sample. Samples then fall in three different classes, two of which are interesting DE cases: image

Now the hypothesis "Is negative DE" has probability 0.26, and the hypothesis "Is positive DE" has probability 0.45.

To simplify results I think the output could return max(is_negative_de, is_positive_de) as proba_de.

What do you think?

/Valentine

vals avatar Feb 09 '22 05:02 vals

Hi Valentine, this is a really useful feature to filter out pathological genes. I am reordering the codebase related to DE to prepare for upcoming changes and included your feature here: #1360

PierreBoyeau avatar May 13 '22 02:05 PierreBoyeau