scanpy
scanpy copied to clipboard
sc.tl.rank_gens_groups pts
when n_genes is set to a value (such as 2000), and pts=True, then sc.tl.rank_genes_groups will compute the fraction of cells expressing the genes, but the output includes all the genes, not just the 2000 genes.
Can you elaborate? What do you mean with the output?
In the past we only computed up to 100 genes by default but now we do it for all. You can always limit the number of genes you want to see afterwards. So maybe we should remove the n_genes
from the function or deprecate the parameter.
How can one get a DEG table with a pts column for each cluster? So that for each group there would be 4 columns: 'names', 'logfoldchanges', 'pvals_adj' and 'pts'?
Manual sorting from 2 files is not quite optimal:
sc.tl.rank_genes_groups(adata, 'cell_types', method='wilcoxon', pts=True)
sc.pl.rank_genes_groups(adata, n_genes=25, sharey=False)
result = adata.uns['rank_genes_groups']
groups = result['names'].dtype.names
degs_by_cluster = pd.DataFrame({group + '_' + key[:14]: result[key][group]
for group in groups for key in ['names', 'logfoldchanges', 'pvals_adj']})
degs_by_cluster.to_csv("DEG_adata_cell_types_pct_to_sort.csv")
pts=pd.DataFrame(adata.uns['rank_genes_groups']['pts'])
pts.to_csv("pts_adata.csv")
Could you help with a more efficient way to do that? @fidelram @ivirshup
Hello
I am also facing the same problem.
I would like to get gene name, log fold change, pval_adj, pts.pts_rest in a single output CSV file but i couldn't able to do that
sc.tl.rank_genes_groups(adata,"leiden_0.6", method='t-test',pts=True,corr_method='benjamini-hochberg') pd.DataFrame(adata.uns['rank_genes_groups']['names']) result = adata.uns['rank_genes_groups'] groups = result['names'].dtype.names df= pd.DataFrame( {group + '_' + key[:1]: result[key][group] for group in groups for key in ['names','logfoldchanges','pts','pts_rest','pvals','pvals_adj']}) df.to_csv("/home/Akila/integration/harmony/subset/celltype/find_markergenes.csv")
Any idea how to get in the single file along with pts??
Thanks Akila
Try the following code:
Differential expression and marker genes
result = adata.uns['rank_genes_groups'] groups = result['names'].dtype.names df1 = pd.DataFrame({group+'' + key:result[key][group] for group in groups for key in ['names','scores','logfoldchanges','pvals','pvals_adj']}) df2 = pd.DataFrame({group+'' + key:result[key][group] for group in groups for key in ['pts','pts_rest']}) pd.concat([df1[[group+'_names',group+'_scores',group+'_logfoldchanges',group+'_pvals',group+'_pvals_adj']].merge(df2[[group+"_pts",group+"_pts_rest"]],how="left",left_on=group+"_names",right_index=True) for group in groups],axis=1).to_csv("markers.csv")