scanpy icon indicating copy to clipboard operation
scanpy copied to clipboard

sc.tl.rank_gens_groups pts

Open wangjiawen2013 opened this issue 4 years ago • 4 comments

when n_genes is set to a value (such as 2000), and pts=True, then sc.tl.rank_genes_groups will compute the fraction of cells expressing the genes, but the output includes all the genes, not just the 2000 genes.

wangjiawen2013 avatar Oct 13 '20 03:10 wangjiawen2013

Can you elaborate? What do you mean with the output?

In the past we only computed up to 100 genes by default but now we do it for all. You can always limit the number of genes you want to see afterwards. So maybe we should remove the n_genes from the function or deprecate the parameter.

fidelram avatar Oct 13 '20 08:10 fidelram

How can one get a DEG table with a pts column for each cluster? So that for each group there would be 4 columns: 'names', 'logfoldchanges', 'pvals_adj' and 'pts'?

Manual sorting from 2 files is not quite optimal:

sc.tl.rank_genes_groups(adata, 'cell_types', method='wilcoxon', pts=True)
sc.pl.rank_genes_groups(adata, n_genes=25, sharey=False)
result = adata.uns['rank_genes_groups']
groups = result['names'].dtype.names
degs_by_cluster = pd.DataFrame({group + '_' + key[:14]: result[key][group]
    for group in groups for key in ['names', 'logfoldchanges', 'pvals_adj']})
degs_by_cluster.to_csv("DEG_adata_cell_types_pct_to_sort.csv")
pts=pd.DataFrame(adata.uns['rank_genes_groups']['pts'])
pts.to_csv("pts_adata.csv")

Could you help with a more efficient way to do that? @fidelram @ivirshup

ilcink avatar Mar 14 '22 09:03 ilcink

Hello I am also facing the same problem. I would like to get gene name, log fold change, pval_adj, pts.pts_rest in a single output CSV file but i couldn't able to do that sc.tl.rank_genes_groups(adata,"leiden_0.6", method='t-test',pts=True,corr_method='benjamini-hochberg') pd.DataFrame(adata.uns['rank_genes_groups']['names']) result = adata.uns['rank_genes_groups'] groups = result['names'].dtype.names df= pd.DataFrame( {group + '_' + key[:1]: result[key][group] for group in groups for key in ['names','logfoldchanges','pts','pts_rest','pvals','pvals_adj']}) df.to_csv("/home/Akila/integration/harmony/subset/celltype/find_markergenes.csv")

Any idea how to get in the single file along with pts??

Thanks Akila

AkilaRanjith avatar Jun 23 '22 20:06 AkilaRanjith

Try the following code:

Differential expression and marker genes

result = adata.uns['rank_genes_groups'] groups = result['names'].dtype.names df1 = pd.DataFrame({group+'' + key:result[key][group] for group in groups for key in ['names','scores','logfoldchanges','pvals','pvals_adj']}) df2 = pd.DataFrame({group+'' + key:result[key][group] for group in groups for key in ['pts','pts_rest']}) pd.concat([df1[[group+'_names',group+'_scores',group+'_logfoldchanges',group+'_pvals',group+'_pvals_adj']].merge(df2[[group+"_pts",group+"_pts_rest"]],how="left",left_on=group+"_names",right_index=True) for group in groups],axis=1).to_csv("markers.csv")

wangjiawen2013 avatar Aug 24 '22 08:08 wangjiawen2013