scanpy icon indicating copy to clipboard operation
scanpy copied to clipboard

mean expression and percentage

Open wangjiawen2013 opened this issue 5 years ago • 6 comments

Dear, Is there a function that returns mean expression and percentage of each gene in a cluster ? scanpy.api.pl.dotplot() includes these information implicitly, so perhaps it's the easiest way to return a table, not only the plot.

By the way, can the plots generated by scanpy be saved as vector graph ? Now the cell points on the plot are not in vector graph format and will be mosaic when amplified, though the letters and axes are in vector format.

wangjiawen2013 avatar Oct 30 '18 06:10 wangjiawen2013

@fidelram are you calling an implicit function summarize_categorical or something that could be exposed to the user as a tool?

@wangjiawen2013 sc.set_figure_params(vector_friendly=False) does what you want: https://scanpy.readthedocs.io/en/latest/api/index.html#settings

falexwolf avatar Nov 05 '18 01:11 falexwolf

I have got what I want with the following code adapted from dotplot():

gene_ids = adata.raw.var.index.values clusters = adata.obs['louvain'].cat.categories obs = adata.raw[:,gene_ids].X.toarray() obs = pd.DataFrame(obs,columns=gene_ids,index=adata.obs['louvain']) average_obs = obs.groupby(level=0).mean() obs_bool = obs.astype(bool) fraction_obs = obs_bool.groupby(level=0).sum()/obs_bool.groupby(level=0).count() average_obs.T.to_csv("average.csv") fraction_obs.T.to_csv("fraction.csv")

wangjiawen2013 avatar Nov 05 '18 04:11 wangjiawen2013

I could modify dotplot to return this information. Initially, I thought that the data used by dot plot was too ad hoc because the percentage (size of dot) is based on the dropouts, which only is meaningful on the raw matrix. However, I keep finding this information useful to eyeball potential markers expressed only on a single cluster.

fidelram avatar Nov 05 '18 08:11 fidelram

I would also be interested in a version which delivers the information shown in the dotplot! Would be extremely useful for automatic cluster annotation.

lisbeth-dot-95 avatar Sep 25 '20 10:09 lisbeth-dot-95

Yes, Absolutely. Getting back the dotplot summarized information would be great!

alevax avatar Oct 07 '20 23:10 alevax

Agree. Adding specialized function returning mean expression and percentage of given genes in each cluster will be very useful.

QiangShiPKU avatar Apr 29 '22 07:04 QiangShiPKU

I have got what I want with the following code adapted from dotplot():

gene_ids = adata.raw.var.index.values clusters = adata.obs['louvain'].cat.categories obs = adata.raw[:,gene_ids].X.toarray() obs = pd.DataFrame(obs,columns=gene_ids,index=adata.obs['louvain']) average_obs = obs.groupby(level=0).mean() obs_bool = obs.astype(bool) fraction_obs = obs_bool.groupby(level=0).sum()/obs_bool.groupby(level=0).count() average_obs.T.to_csv("average.csv") fraction_obs.T.to_csv("fraction.csv")

Love this! Thanks a lot!! Just one question, is there a way to get the average expression in different cell types (cluster label 1 ) in different sample (cluster label 2 ) from an integrated object?? to get something roughly like this:

                     Gene 1                                            Gene 2 
          sample1   sample2   sample3     sample1   sample2    sample3 ..... ....... ....

T-cell B-cell ..... .....

I am not sure if this makes sense, but I have been trying to do this for a while and nothing worked!

Qtasnim avatar Dec 02 '22 02:12 Qtasnim