stereoscope
stereoscope copied to clipboard
How can I get the top genes for a celltype?
Hello Alma!! :)
Hope everything is going great! I have a quick question regarding some of the outputs of stereoscope. I am interested in looking at the genes that stereoscope has decide to be the most descriptive of a certain celltype, but I am struggling in getting this.
In the output of stereoscope, I find 2 files that could be of interest: R*.tsv
file stores rates for each gene for each celltype, and then the logits*.tsv
file that from my understanding gives an indication of how good of an explanatory variable each gene is.
In order to get the the exact "weights" of a gene for a celltype, should I multiply the rates matrix * the logits matrix?
Thank you for your help! :))
Hi @jemorlanes ,
thanks for using stereoscope
and reaching out. So what you could do is: to compute the expected value of a given gene within every cell type. If you look at the definition of the mean here you see that it's given as mean = r(1-p)/p
, meanwhile logits = log(1-p)/p
. Hence, you have mean_gz = r_gz * t.exp(logits_gz)
. Pseudocode for this would be something like (in python):
R = pd.read_csv("R*.tsv",header = 0, index_col = 0)
logits = pd.read_csv("logits*.tsv",header = 0, index_col = 0)
mean = R * np.exp(logits)
You could then extract those genes that seem to be most highly expressed within that cell type. However, the most highly expressed genes aren't necessarily the most descriptive ones, for that kind of information you need a contrastive analysis, essentially a DGE but with only one sample per cell type, but that's also easy to execute once you have the expected values.
Best, Alma
Hi Alma!
Super insightful, thank you! When you say "a DGE with only one sample per celltype", you mean?:
- Get the expected mean for each gene in each celltype.
- Run DGE between the celltypes using that expected mean.