stereoscope icon indicating copy to clipboard operation
stereoscope copied to clipboard

How can I get the top genes for a celltype?

Open jemorlanes opened this issue 1 year ago • 2 comments

Hello Alma!! :)

Hope everything is going great! I have a quick question regarding some of the outputs of stereoscope. I am interested in looking at the genes that stereoscope has decide to be the most descriptive of a certain celltype, but I am struggling in getting this.

In the output of stereoscope, I find 2 files that could be of interest: R*.tsv file stores rates for each gene for each celltype, and then the logits*.tsv file that from my understanding gives an indication of how good of an explanatory variable each gene is.

In order to get the the exact "weights" of a gene for a celltype, should I multiply the rates matrix * the logits matrix?

Thank you for your help! :))

jemorlanes avatar Apr 27 '23 08:04 jemorlanes

Hi @jemorlanes ,

thanks for using stereoscope and reaching out. So what you could do is: to compute the expected value of a given gene within every cell type. If you look at the definition of the mean here you see that it's given as mean = r(1-p)/p, meanwhile logits = log(1-p)/p. Hence, you have mean_gz = r_gz * t.exp(logits_gz). Pseudocode for this would be something like (in python):

R = pd.read_csv("R*.tsv",header = 0, index_col = 0)
logits = pd.read_csv("logits*.tsv",header = 0, index_col = 0)
mean  = R * np.exp(logits)

You could then extract those genes that seem to be most highly expressed within that cell type. However, the most highly expressed genes aren't necessarily the most descriptive ones, for that kind of information you need a contrastive analysis, essentially a DGE but with only one sample per cell type, but that's also easy to execute once you have the expected values.

Best, Alma

almaan avatar Apr 27 '23 17:04 almaan

Hi Alma!

Super insightful, thank you! When you say "a DGE with only one sample per celltype", you mean?:

  • Get the expected mean for each gene in each celltype.
  • Run DGE between the celltypes using that expected mean.

jemorlanes avatar May 09 '23 10:05 jemorlanes