pySCENIC
pySCENIC copied to clipboard
pyscenic ctx unknown error
Hi everyone, I am trying to use the pyscenic workflow to infer grn on my 10X scRNA dataset. I completely followed the demonstration workflow posted on SCENICprotocol cancer data sets using SKCM scRNA data. The problem is, when I ran pyscenic ctx subcommand, I got the following error when writing results to files and no matter how I checked, I have no idea what happened.
734 2019-11-19 23:44:53,454 - pyscenic.transform - WARNING - Less than 80% of the genes in TBX3 could be mapped to hg38__refseq-r80__500
735
736 2019-11-19 23:44:53,579 - pyscenic.transform - WARNING - Less than 80% of the genes in TBX6 could be mapped to hg38__refseq-r80__500
737
738 2019-11-19 23:45:01,445 - pyscenic.transform - WARNING - Less than 80% of the genes in HES2 could be mapped to hg38__refseq-r80__500
739
740 2019-11-19 23:45:01,590 - pyscenic.transform - WARNING - Less than 80% of the genes in TEF could be mapped to hg38__refseq-r80__500b
741
742 2019-11-19 23:45:04,100 - pyscenic.transform - WARNING - Less than 80% of the genes in HES5 could be mapped to hg38__refseq-r80__500
743
744 2019-11-19 23:45:07,137 - pyscenic.transform - WARNING - Less than 80% of the genes in HIC1 could be mapped to hg38__refseq-r80__500
745
746 2019-11-19 23:45:08,539 - pyscenic.transform - WARNING - Less than 80% of the genes in ESR2 could be mapped to hg38__refseq-r80__500
747
748 2019-11-19 23:49:29,864 - pyscenic.cli.pyscenic - INFO - Writing results to file.
749 Traceback (most recent call last):
750 File "/home/liuziyang/miniconda3/envs/pyscenic/lib/python3.7/site-packages/pandas/core/frame.py", line 3540, in _ensure_valid_inde
751 value = Series(value)
752 File "/home/liuziyang/miniconda3/envs/pyscenic/lib/python3.7/site-packages/pandas/core/series.py", line 316, in init
753 data = SingleBlockManager(data, index, fastpath=True)
754 File "/home/liuziyang/miniconda3/envs/pyscenic/lib/python3.7/site-packages/pandas/core/internals/managers.py", line 1516, in __ini
755 block = make_block(block, placement=slice(0, len(axis)), ndim=1)
756 File "/home/liuziyang/miniconda3/envs/pyscenic/lib/python3.7/site-packages/pandas/core/internals/blocks.py", line 3284, in make_bl
757 return klass(values, ndim=ndim, placement=placement)
758 File "/home/liuziyang/miniconda3/envs/pyscenic/lib/python3.7/site-packages/pandas/core/internals/blocks.py", line 2792, in _init
759 super().init(values, ndim=ndim, placement=placement)
760 File "/home/liuziyang/miniconda3/envs/pyscenic/lib/python3.7/site-packages/pandas/core/internals/blocks.py", line 128, in init
761 "{mgr}".format(val=len(self.values), mgr=len(self.mgr_locs))
762 ValueError: Wrong number of items passed 8, placement implies 0
763
...skipping...
750 File "/home/liuziyang/miniconda3/envs/pyscenic/lib/python3.7/site-packages/pandas/core/frame.py", line 3540, in _ensure_valid_inde
751 value = Series(value)
752 File "/home/liuziyang/miniconda3/envs/pyscenic/lib/python3.7/site-packages/pandas/core/series.py", line 316, in init
753 data = SingleBlockManager(data, index, fastpath=True)
754 File "/home/liuziyang/miniconda3/envs/pyscenic/lib/python3.7/site-packages/pandas/core/internals/managers.py", line 1516, in __ini
755 block = make_block(block, placement=slice(0, len(axis)), ndim=1)
756 File "/home/liuziyang/miniconda3/envs/pyscenic/lib/python3.7/site-packages/pandas/core/internals/blocks.py", line 3284, in make_bl
757 return klass(values, ndim=ndim, placement=placement)
758 File "/home/liuziyang/miniconda3/envs/pyscenic/lib/python3.7/site-packages/pandas/core/internals/blocks.py", line 2792, in _init
759 super().init(values, ndim=ndim, placement=placement)
760 File "/home/liuziyang/miniconda3/envs/pyscenic/lib/python3.7/site-packages/pandas/core/internals/blocks.py", line 128, in init
761 "{mgr}".format(val=len(self.values), mgr=len(self.mgr_locs))
762 ValueError: Wrong number of items passed 8, placement implies 0
763
764 During handling of the above exception, another exception occurred:
765
766 Traceback (most recent call last):
767 File "/Bailab3/liuziyang/Program/ShellScript/pyscenic/pyscenic.case.study.v22.py", line 177, in
Please help
Dear,
The problem seems to be in creation of regulons from the enriched motif table. A temporary and quick workaround could be the use the CLI command for the cisTarget step to create the enriched motif table instead of the regulons and then use some Python code to the transformation to regulons (cf https://github.com/aertslab/pySCENIC/issues/105#issuecomment-553561902).
To generate the enriched motif table from the CLI use a the -o option from pyscenic ctx subtask with '.csv' or '.tsv' as the filename extension:
--output Output file/stream, i.e. a table of enriched motifs and target genes (csv, tsv) or collection of regulons (yaml, gmt, dat, json).
Kindest regards, Bram
Thank you for your reply. But I already tried that. Here is my code, hopefully you will find something odd here.
import os, glob, re, pickle from functools import partial from collections import OrderedDict import operator as op from cytoolz import compose
import pandas as pd import seaborn as sns import numpy as np import scanpy as sc import anndata as ad import matplotlib as mpl import matplotlib.pyplot as plt
from pyscenic.export import export2loom, add_scenic_metadata from pyscenic.utils import load_motifs from pyscenic.transform import df2regulons from pyscenic.aucell import aucell from pyscenic.binarization import binarize from pyscenic.rss import regulon_specificity_scores from pyscenic.plotting import plot_binarization, plot_rss
from IPython.display import HTML, display
------------------------------------------------------------------------------------------
Set maximum number of jobs for Scanpy.
sc.settings.njobs = 20
Specify folders.
RESOURCES_FOLDERNAME = "/ailab7/PROJECT/ziyang/CC_hetero/scRNA/pyscenic/resources/" #TPM matrix and cell annotation AUXILLIARIES_FOLDERNAME = "/ailab3/ziyang/Database/SCENIC/" #Feather file RESULTS_FOLDERNAME = "/ailab7/PROJECT/ziyang/CC_hetero/scRNA/pyscenic/results/" # For results FIGURES_FOLDERNAME = "/ailab7/PROJECT/ziyang/CC_hetero/scRNA/pyscenic/figures/" # For plots sc.settings.figdir = FIGURES_FOLDERNAME
Auxilliary functions.
BASE_URL = "http://motifcollections.aertslab.org/v9/logos/" COLUMN_NAME_LOGO = "MotifLogo" COLUMN_NAME_MOTIF_ID = "MotifID" COLUMN_NAME_TARGETS = "TargetGenes"
def savesvg(fname: str, fig, folder: str=FIGURES_FOLDERNAME) -> None: """ Save figure as vector-based SVG image format. """ fig.tight_layout() fig.savefig(os.path.join(folder, fname), format='svg')
def display_logos(df: pd.DataFrame, top_target_genes: int = 3, base_url: str = BASE_URL): """ :param df: :param base_url: """ # Make sure the original dataframe is not altered. df = df.copy()
# Add column with URLs to sequence logo.
def create_url(motif_id):
return '<img src="{}{}.png" style="max-height:124px;"></img>'.format(base_url, motif_id)
df[("Enrichment", COLUMN_NAME_LOGO)] = list(map(create_url, df.index.get_level_values(COLUMN_NAME_MOTIF_ID)))
# Truncate TargetGenes.
def truncate(col_val):
return sorted(col_val, key=op.itemgetter(1))[:top_target_genes]
df[("Enrichment", COLUMN_NAME_TARGETS)] = list(map(truncate, df[("Enrichment", COLUMN_NAME_TARGETS)]))
MAX_COL_WIDTH = pd.get_option('display.max_colwidth')
pd.set_option('display.max_colwidth', -1)
display(HTML(df.head().to_html(escape=False)))
pd.set_option('display.max_colwidth', MAX_COL_WIDTH)
Auxilliary data sets.
HUMAN_TFS_FNAME = os.path.join(AUXILLIARIES_FOLDERNAME, 'hs_hgnc_curated_tfs.txt') RANKING_DBS_FNAMES = list(map(lambda fn: os.path.join(AUXILLIARIES_FOLDERNAME, fn), ['hg38__refseq-r80__10kb_up_and_down_tss.mc9nr.feather','hg38__refseq-r80__500bp_up_and_100bp_down_tss.mc9nr.feather'])) MOTIF_ANNOTATIONS_FNAME = os.path.join(AUXILLIARIES_FOLDERNAME, 'motifs-v9-nr.hgnc-m0.001-o0.0.tbl')
Data acquisition and result storage.
DATASET_ID = "P11" CODE = 'CC' CELL_ANNOTATIONS_FNAME = os.path.join(RESOURCES_FOLDERNAME, "CC_cell.annotations.csv") #SAMPLE_METADATA_FNAME = os.path.join(RESOURCES_FOLDERNAME, "1-s2.0-S0092867418311784-mmc1.xlsx") EXP_MTX_TPM_FNAME = os.path.join(RESOURCES_FOLDERNAME, 'CC_tpm.csv') #EXP_MTX_COUNTS_FNAME = os.path.join(RESOURCES_FOLDERNAME, 'GSE115978_counts.csv') METADATA_FNAME = os.path.join(RESULTS_FOLDERNAME, '{}.metadata.csv'.format(DATASET_ID)) EXP_MTX_QC_FNAME = os.path.join(RESULTS_FOLDERNAME, '{}.qc.tpm.csv'.format(DATASET_ID)) ADJACENCIES_FNAME = os.path.join(RESULTS_FOLDERNAME, '{}.adjacencies.tsv'.format(DATASET_ID)) MOTIFS_FNAME = os.path.join(RESULTS_FOLDERNAME, '{}.motifs.csv'.format(DATASET_ID)) REGULONS_DAT_FNAME = os.path.join(RESULTS_FOLDERNAME, '{}.regulons.dat'.format(DATASET_ID)) AUCELL_MTX_FNAME = os.path.join(RESULTS_FOLDERNAME, '{}.auc.csv'.format(DATASET_ID)) BIN_MTX_FNAME = os.path.join(RESULTS_FOLDERNAME, '{}.bin.csv'.format(DATASET_ID)) THR_FNAME = os.path.join(RESULTS_FOLDERNAME, '{}.thresholds.csv'.format(DATASET_ID)) ANNDATA_FNAME = os.path.join(RESULTS_FOLDERNAME, '{}.h5ad'.format(DATASET_ID)) LOOM_FNAME = os.path.join(RESULTS_FOLDERNAME, '{}_{}.loom'.format(CODE, DATASET_ID))
STEP 0: Preprocessing.
STEP 0.1: METADATA CLEANING. Optional
df_annotations = pd.read_csv(CELL_ANNOTATIONS_FNAME) #df_annotations['samples'] = df_annotations['samples'].str.upper() #df_annotations.rename(columns={'cell.types': 'cell_type', 'cells': 'cell_id', 'samples': 'sample_id', 'treatment.group': 'treatment_group', 'Cohort': 'cohort'}, inplace=True) #df_annotations['cell_type'] = df_annotations.cell_type.replace({
'Mal': 'Malignant Cell',
'Endo.': 'Endothelial Cell',
'T.CD4': 'Thelper Cell',
'CAF': 'Fibroblast',
'T.CD8': 'Tcytotoxic Cell',
'T.cell': 'T Cell',
'NK': 'NK Cell',
'B.cell': 'B Cell'})
df_samples = pd.read_excel(SAMPLE_METADATA_FNAME, header=2)
df_samples = df_samples.drop(['Cohort'], axis=1)
df_samples['Sample'] = df_samples.Sample.str.upper()
df_metadata = pd.merge(df_annotations, df_samples, left_on='sample_id', right_on='Sample')
df_metadata = df_metadata.drop(['Sample', 'treatment_group'], axis=1)
df_metadata.rename(columns={'Patient': 'patient_id',
'Age': 'age', 'Sex': 'sex', 'Lesion type': 'lesion_type', 'Site': 'site',
'Treatment': 'treatment', 'Treatment group': 'treatment_group'}, inplace=True)
df_annotations.to_csv(METADATA_FNAME, index=False)
STEP 0.2: EXPRESSION MATRIX QC
df_counts = pd.read_csv(EXP_MTX_COUNTS_FNAME, index_col=0)
df_tpm = pd.read_csv(EXP_MTX_TPM_FNAME, index_col=0) adata = sc.AnnData(X=df_tpm.T.sort_index()) #df_obs = df_annotations[['Cell', 'nCount_RNA', 'nFeature_RNA', 'percent.mt', 'Sample','Cell_type']].set_index('Cell').sort_index() #adata.obs = df_obs adata.var_names_make_unique() #$sc.pp.filter_cells(adata, min_genes=200) sc.pp.filter_genes(adata, min_cells=5)
Store non-log transformed data as raw. This data can be used via the use_raw parameters available for many functions.
In the scanpy's tutorials this is used to stored all genes in log-transformed counts before retaining only Highly Variable Genes (HVG).
Because in this case no filtering is done we use this feature to store raw counts.
#adata.original = adata #sc.pp.log1p(adata) adata.write_h5ad(ANNDATA_FNAME) # Categorical dtypes are created. adata.to_df().to_csv(EXP_MTX_QC_FNAME)
STEP 1: Network inference based on GRNBoost2 from CLI.
For this step the CLI version of SCENIC is used. This step can be deployed on an High Performance Computing system.
Output: List of adjacencies between a TF and its targets stored in ADJACENCIES_FNAME.
commands_grn = 'pyscenic grn {} {} -o {} --num_workers 30'.format(EXP_MTX_QC_FNAME, HUMAN_TFS_FNAME, ADJACENCIES_FNAME) os.system(commands_grn)
pyscenic grn {EXP_MTX_QC_FNAME} {HUMAN_TFS_FNAME} -o {ADJACENCIES_FNAME} --num_workers 10
STEP 2-3: Regulon prediction aka cisTarget from CLI.
For this step the CLI version of SCENIC is used. This step can be deployed on an High Performance Computing system.
Output: List of adjacencies between a TF and its targets stored in MOTIFS_FNAME.
#DBS_PARAM = ' '.join(RANKING_DBS_FNAMES) DBS_PARAM = '/Bailab3/liuziyang/Database/SCENIC/hg38__refseq-r80__10kb_up_and_down_tss.mc9nr.feather /Bailab3/liuziyang/Database/SCENIC/hg38__refseq-r80__500bp_up_and_100bp_down_tss.mc9nr.feather' #commands_ctx = 'pyscenic ctx {} {} --annotations_fname {} --expression_mtx_fname {} -o {} --num_workers 30'.format(ADJACENCIES_FNAME, DBS_PARAM, MOTIF_ANNOTATIONS_FNAME, EXP_MTX_QC_FNAME, MOTIFS_FNAME) commands_ctx = 'pyscenic ctx -o {} --annotations_fname {} --num_workers 30 --expression_mtx_fname {} {} {}'.format(MOTIFS_FNAME, MOTIF_ANNOTATIONS_FNAME, EXP_MTX_QC_FNAME, ADJACENCIES_FNAME, DBS_PARAM) os.system(commands_ctx)
df_motifs = load_motifs(MOTIFS_FNAME)
STEP 4: Cellular enrichment aka AUCell.
REGULON CREATION.
def derive_regulons(motifs, db_names=('hg38__refseq-r80__10kb_up_and_down_tss', 'hg38__refseq-r80__500bp_up_and_100bp_down_tss')):
motifs.columns = motifs.columns.droplevel(0)
def contains(*elems):
def f(context):
return any(elem in context for elem in elems)
return f
# For the creation of regulons we only keep the 10-species databases and the activating modules. We also remove the
# enriched motifs for the modules that were created using the method 'weight>50.0%' (because these modules are not part
# of the default settings of modules_from_adjacencies anymore.
motifs = motifs[
np.fromiter(map(compose(op.not_, contains('weight>50.0%')), motifs.Context), dtype=np.bool) &
np.fromiter(map(contains(*db_names), motifs.Context), dtype=np.bool) &
np.fromiter(map(contains('activating'), motifs.Context), dtype=np.bool)]
# We build regulons only using enriched motifs with a NES of 3.0 or higher; we take only directly annotated TFs or TF annotated
# for an orthologous gene into account; and we only keep regulons with at least 10 genes.
regulons = list(filter(lambda r: len(r) >= 10, df2regulons(motifs[(motifs['NES'] >= 3.0) & ((motifs['Annotation'] == 'gene is directly annotated') | (motifs['Annotation'].str.startswith('gene is orthologous to') & motifs['Annotation'].str.endswith('which is directly annotated for motif')))])))
# Rename regulons, i.e. remove suffix.
return list(map(lambda r: r.rename(r.transcription_factor), regulons))
regulons = derive_regulons(df_motifs)
Pickle these regulons.
with open(REGULONS_DAT_FNAME, 'wb') as f: pickle.dump(regulons, f)
AUCELL.
auc_mtx = aucell(df_tpm.T, regulons, num_workers=20) auc_mtx.to_csv(AUCELL_MTX_FNAME) auc_mtx = pd.read_csv(AUCELL_MTX_FNAME, index_col=0)
OPTIONAL STEP 5 - Regulon activity binarization.
bin_mtx, thresholds = binarize(auc_mtx) bin_mtx.to_csv(BIN_MTX_FNAME) thresholds.to_frame().rename(columns={0:'threshold'}).to_csv(THR_FNAME) bin_mtx = pd.read_csv(BIN_MTX_FNAME, index_col=0) thresholds = pd.read_csv(THR_FNAME, index_col=0).threshold
AUC score of selected genes across cells.
fig, ((ax1, ax2, ax3, ax4), (ax5, ax6, ax7, ax8)) = plt.subplots(2, 4, figsize=(8, 4), dpi=300) plot_binarization(auc_mtx, 'NFKB2', thresholds['NFKB2'], ax=ax1) plot_binarization(auc_mtx, 'MITF', thresholds['MITF'], ax=ax2) plot_binarization(auc_mtx, 'FOXP3', thresholds['FOXP3'], ax=ax3) plot_binarization(auc_mtx, 'PAX5', thresholds['PAX5'], ax=ax4) plot_binarization(auc_mtx, 'IRF8', thresholds['IRF8'], ax=ax5) plot_binarization(auc_mtx, 'IRF3', thresholds['IRF3'], ax=ax6) plot_binarization(auc_mtx, 'MLX', thresholds['MLX'], ax=ax7) plot_binarization(auc_mtx, 'YY1', thresholds['YY1'], ax=ax8) plt.tight_layout() savesvg('hists-ICC_P11-binarization.svg', fig)
Create heatmap with binarized regulon activity.
def palplot(pal, names, colors=None, size=1): n = len(pal) f, ax = plt.subplots(1, 1, figsize=(n * size, size)) ax.imshow(np.arange(n).reshape(1, n), cmap=mpl.colors.ListedColormap(list(pal)), interpolation="nearest", aspect="auto") ax.set_xticks(np.arange(n) - .5) ax.set_yticks([-.5, .5]) ax.set_xticklabels([]) ax.set_yticklabels([]) colors = n * ['k'] if colors is None else colors for idx, (name, color) in enumerate(zip(names, colors)): ax.text(0.0+idx, 0.0, name, color=color, horizontalalignment='center', verticalalignment='center') return f
N_COLORS = len(adata.obs.Sample.dtype.categories) COLORS = [color['color'] for color in mpl.rcParams["axes.prop_cycle"]] cell_type_color_lut = dict(zip(adata.obs.Sample.dtype.categories, COLORS)) #cell_type_color_lut = dict(zip(adata.obs.cell_type.dtype.categories, adata.uns['cell_type_colors'])) cell_id2cell_type_lut = df_metadata.set_index('Cell').cell_type.to_dict() bw_palette = sns.xkcd_palette(["white", "black"]) sns.set() sns.set_style("whitegrid") fig = palplot(bw_palette, ['OFF', 'ON'], ['k', 'w']) savesvg('legend-CC_P11-on_off.svg', fig)
sns.set() sns.set(font_scale=1.0) sns.set_style("ticks", {"xtick.minor.size": 1, "ytick.minor.size": 0.1}) g = sns.clustermap(bin_mtx.T, col_colors=auc_mtx.index.map(cell_id2cell_type_lut).map(cell_type_color_lut), cmap=bw_palette, figsize=(20,20)) g.ax_heatmap.set_xticklabels([]) g.ax_heatmap.set_xticks([]) g.ax_heatmap.set_xlabel('Cells') g.ax_heatmap.set_ylabel('Regulons') g.ax_col_colors.set_yticks([0.5]) g.ax_col_colors.set_yticklabels(['Cell Type']) g.cax.set_visible(False) g.fig.savefig(os.path.join(FIGURES_FOLDERNAME, 'clustermap-ICC_P11.png'), format='png')
Save clustered binarized heatmap to Excel for further inspection.
bin_mtx_clustered = bin_mtx.T.copy() bin_mtx_clustered.rename(columns=df_annotations.set_index('Cell')['Sample'].to_dict(), inplace=True) bin_mtx_clustered.iloc[g.dendrogram_row.reordered_ind, g.dendrogram_col.reordered_ind].to_excel(os.path.join(RESULTS_FOLDERNAME, 'ICC_P11-Binarized regulon activity.xlsx'))
Generate sequence logos.
def fetch_logo(regulon, base_url = BASE_URL):
for elem in regulon.context:
if elem.endswith('.png'):
return ''.format(base_url, elem)
return ""
df_regulons = pd.DataFrame(data=[list(map(op.attrgetter('name'), regulons)), list(map(len, regulons)), list(map(fetch_logo, regulons))], index=['name', 'count', 'logo']).T
MAX_COL_WIDTH = pd.get_option('display.max_colwidth') pd.set_option('display.max_colwidth', -1) #display(HTML(df_regulons.head().to_html(escape=False))) pd.set_option('display.max_colwidth', MAX_COL_WIDTH)
STEP 6: Non-linear projection and clustering.
In this step a scanpy.AnnData object is created containing all metadata and results.
First, we select and retain only the most variable genes in the dataset.
PCA + tSNE PROJECTION.
sns.set() sc.pp.highly_variable_genes(adata) sc.pl.highly_variable_genes(adata) adata = adata[:, adata.var['highly_variable']]
Then we apply a linear-dimensional reduction technique (PCA).
sc.tl.pca(adata, svd_solver='arpack') sc.pl.pca(adata, color='CD8A') sc.pl.pca_variance_ratio(adata, log=False)
Followed by a tSNE projection.
sc.tl.tsne(adata) sc.set_figure_params(frameon=False, dpi=300, fontsize=8) sc.pl.tsne(adata, color=['Sample'], title=['CC_P11-Sample', 'CC_P11-{} Samples'.format(len(adata.obs.Sample.unique()))], palette=COLORS, save='CC_P11-PCA+tSNE.svg')
We capture the non-linear projection based on PCA+tSNE for later storage in the loom file.
embedding_pca_tsne = pd.DataFrame(adata.obsm['X_tsne'], columns=[['_X', '_Y']], index=adata.obs_names)
We add all metadata derived from SCENIC to the scanpy.AnnData object.
add_scenic_metadata(adata, auc_mtx, regulons) adata.write_h5ad(ANNDATA_FNAME)
AUCELL + tSNE PROJECTION.
We change the tSNE projection so that it relies on AUCell instead of PCA.
sc.tl.tsne(adata, use_rep='X_aucell') embedding_aucell_tsne = pd.DataFrame(adata.obsm['X_tsne'], columns=[['_X', '_Y']], index=adata.obs_names) sc.set_figure_params(frameon=False, dpi=300, fontsize=8) sc.pl.tsne(adata, color=['cell_type', 'lesion_type', 'patient_id'], title=['CC_P11-Sample', 'CC_P11-{} Samples'.format(len(adata.obs.Sample.unique()))], palette=COLORS, save='ICC_P11-AUCell+tSNE.svg')
CELL TYPE SPECIFIC REGULATORS - Z-SCORE.
To find cell type specific regulators we use a Z score (i.e. the average AUCell score for the cells of a give type are standardized using the overall average AUCell scores and its standard deviation).
df_obs = adata.obs signature_column_names = list(df_obs.select_dtypes('number').columns) signature_column_names = list(filter(lambda s: s.startswith('Regulon('), signature_column_names)) df_scores = df_obs[signature_column_names + ['Sample']] df_results = ((df_scores.groupby(by='Sample').mean() - df_obs[signature_column_names].mean())/ df_obs[signature_column_names].std()).stack().reset_index().rename(columns={'level_1': 'regulon', 0:'Z'}) df_results['regulon'] = list(map(lambda s: s[8:-1], df_results.regulon)) df_results[(df_results.Z >= 3.0)].sort_values('Z', ascending=False)
df_heatmap = pd.pivot_table(data=df_results[df_results.Z >= 3.0].sort_values('Z', ascending=False), index='cell_type', columns='regulon', values='Z') #df_heatmap.drop(index='Myocyte', inplace=True) # We leave out Myocyte because many TFs are highly enriched (becuase of small number of cells). fig, ax1 = plt.subplots(1, 1, figsize=(10, 8)) sns.heatmap(df_heatmap, ax=ax1, annot=True, fmt=".1f", linewidths=.7, cbar=False, square=True, linecolor='gray', cmap="YlGnBu", annot_kws={"size": 6}) ax1.set_ylabel('') savesvg('heatmap-ICC_P11-regulons.svg', fig)
CELL TYPE SPECIFIC REGULATORS - RSS.
#rss = regulon_specificity_scores(auc_mtx, adata.obs.Sample) #sns.set() #sns.set(style='whitegrid', font_scale=0.8) #fig, ((ax1, ax2, ax3, ax4), (ax5, ax6, ax7, ax8)) = plt.subplots(2, 4, figsize=(8, 6), dpi=300) #plot_rss(rss, 'B Cell', ax=ax1) #ax1.set_xlabel('') #plot_rss(rss, 'T Cell', ax=ax2) #ax2.set_xlabel('') #ax2.set_ylabel('') #plot_rss(rss, 'Thelper Cell', ax=ax3) #ax3.set_xlabel('') #ax3.set_ylabel('') #plot_rss(rss, 'Tcytotoxic Cell', ax=ax4) #ax4.set_xlabel('') #ax4.set_ylabel('') #plot_rss(rss, 'NK Cell', ax=ax5) #plot_rss(rss, 'Fibroblast', ax=ax6) #ax6.set_ylabel('') #plot_rss(rss, 'Macrophage', ax=ax7) #ax7.set_ylabel('') #plot_rss(rss, 'Endothelial Cell', ax=ax8) #ax8.set_ylabel('') #plt.tight_layout() #savesvg('plots - GSE115978 - rss.svg', fig)
#sc.set_figure_params(frameon=False, dpi=300, fontsize=8) #sc.pl.tsne(adata, color=['Sample', 'PAX5', 'TCF7', 'EOMES', 'TBX21', 'PRRX2', 'MAFB'],
title=['GSE115978 - SKCM - Cell types', 'PAX5', 'TCF7', 'EOMES', 'TBX21', 'PRRX2', 'MAFB'], ncols=4, use_raw=False, save=' - GSE115978 - rss_gene.svg', palette=COLORS, cmap='magma')
#sc.set_figure_params(frameon=False, dpi=300, fontsize=8) #sc.pl.tsne(adata, color=['Sample', 'Regulon(PAX5)', 'Regulon(TCF7)', 'Regulon(EOMES)', 'Regulon(TBX21)', 'Regulon(PRRX2)', 'Regulon(MAFB)', 'Regulon(SOX7)'],
title=['GSE115978 - SKCM - Cell types', 'Regulon(PAX5)', 'Regulon(TCF7)', 'Regulon(EOMES)', 'Regulon(TBX21)', 'Regulon(PRRX2)', 'Regulon(MAFB)', 'Regulon(SOX7)'], ncols=4, use_raw=False,
save=' - GSE115978 - rss_regulon.svg', palette=COLORS, cmap='viridis')
SELECTED REGULONS.
With the metadata from SCENIC add to the AnnData object, we can create cellular scatterplots for regulon activity. In the example we compare MITF regulon enrichment with the expression of the MITF gene.
#sc.pl.tsne(adata, color=['cell_type', 'Regulon(MITF)', 'Regulon(TWIST1)'],
title=['GSE115978 - SKCM - Cell types', 'Regulon(MITF)', 'Regulon(TWIST1)'], ncols=3, use_raw=False, save=' - GSE115978 - selected_regulons1.svg', palette=COLORS, cmap='viridis')
#sc.pl.tsne(adata, color=['cell_type', 'MITF'],
title=['GSE115978 - SKCM - Cell types', 'MITF'], ncols=3, use_raw=False, save=' - GSE115978 - selected_regulons2.svg', palette=COLORS, cmap='magma')
#sc.set_figure_params(frameon=False, dpi=300, fontsize=8) #sc.pl.tsne(adata, color=['cell_type', 'Regulon(MYC)'],
title=['GSE115978 - SKCM - Cell types', 'MYC - Regulon'], ncols=3, palette=COLORS, use_raw=False, save=' - GSE115978 - AUCell+tSNE - Regulon MYC.svg', cmap='viridis')
AUCELL DISTRIBUTIONS.
We can also analyze the distribution of AUCell values via the plots in the scanpy package.
#df_results.sort_values('Z', ascending=False).groupby(by='cell_type').head(2) #aucell_adata = sc.AnnData(X=auc_mtx.sort_index()) #aucell_adata.obs = df_obs #names = list(map(op.attrgetter('name'), filter(lambda r: r.score > 8.0, regulons))) #sc.pl.stacked_violin(aucell_adata, names, groupby='cell_type', save=' - GSE115978 - regulons.svg')
STEP 7: Export to SCope.
export2loom(adata.to_df(), regulons, LOOM_FNAME, cell_annotations=adata.obs['Sample'].to_dict(), embeddings=OrderedDict([('AUCell + tSNE', embedding_aucell_tsne), ('PCA + tSNE', embedding_pca_tsne)]), auc_mtx = auc_mtx, tree_structure=(), title='{}_{}'.format(CODE, DATASET_ID), nomenclature="HGNC", auc_thresholds=thresholds, compress=True)
I commented out some codes here for use in my case.
Dear,
The problem seems to be in creation of regulons from the enriched motif table. A temporary and quick workaround could be the use the CLI command for the cisTarget step to create the enriched motif table instead of the regulons and then use some Python code to the transformation to regulons (cf #105 (comment)).
To generate the enriched motif table from the CLI use a the -o option from pyscenic ctx subtask with '.csv' or '.tsv' as the filename extension:
--output Output file/stream, i.e. a table of enriched motifs and target genes (csv, tsv) or collection of regulons (yaml, gmt, dat, json).Kindest regards, Bram
Dear Bram! I also encountered the same problem in the step "REGULON CREATION". I run the pipeline completely followed the cancer case study, and when i run:
!pyscenic ctx {ADJACENCIES_FNAME} {DBS_PARAM} --annotations_fname {MOTIF_ANNOTATIONS_FNAME} --expression_mtx_fname {EXP_MTX_QC_FNAME} --output {MOTIFS_FNAME} --num_workers 26 --mask_dropouts
df_motifs = load_motifs(MOTIFS_FNAME)
regulons = derive_regulons(df_motifs)
#################### I got the following error messages:
ValueError Traceback (most recent call last) ~/miniconda3/envs/scenic_protocol/lib/python3.6/site-packages/pandas/core/frame.py in _ensure_valid_index(self, value) 3539 try: -> 3540 value = Series(value) 3541 except (ValueError, NotImplementedError, TypeError):
~/miniconda3/envs/scenic_protocol/lib/python3.6/site-packages/pandas/core/series.py in init(self, data, index, dtype, name, copy, fastpath) 315 --> 316 data = SingleBlockManager(data, index, fastpath=True) 317
~/miniconda3/envs/scenic_protocol/lib/python3.6/site-packages/pandas/core/internals/managers.py in init(self, block, axis, do_integrity_check, fastpath) 1515 if not isinstance(block, Block): -> 1516 block = make_block(block, placement=slice(0, len(axis)), ndim=1) 1517
~/miniconda3/envs/scenic_protocol/lib/python3.6/site-packages/pandas/core/internals/blocks.py in make_block(values, placement, klass, ndim, dtype, fastpath) 3283 -> 3284 return klass(values, ndim=ndim, placement=placement) 3285
~/miniconda3/envs/scenic_protocol/lib/python3.6/site-packages/pandas/core/internals/blocks.py in init(self, values, placement, ndim) 2791 -> 2792 super().init(values, ndim=ndim, placement=placement) 2793
~/miniconda3/envs/scenic_protocol/lib/python3.6/site-packages/pandas/core/internals/blocks.py in init(self, values, placement, ndim) 127 "Wrong number of items passed {val}, placement implies " --> 128 "{mgr}".format(val=len(self.values), mgr=len(self.mgr_locs)) 129 )
ValueError: Wrong number of items passed 8, placement implies 0
During handling of the above exception, another exception occurred:
ValueError Traceback (most recent call last)
~/miniconda3/envs/scenic_protocol/lib/python3.6/site-packages/pyscenic/transform.py in df2regulons(df, save_columns) 316 # Activating is the default! 317 return REPRESSING_MODULE if REPRESSING_MODULE in ctx else ACTIVATING_MODULE --> 318 df[COLUMN_NAME_TYPE] = df.apply(get_type,axis=1) 319 320 # Group all rows per TF and type (+)/(-). Each group results in a single regulon.
~/miniconda3/envs/scenic_protocol/lib/python3.6/site-packages/pandas/core/frame.py in setitem(self, key, value) 3485 else: 3486 # set column -> 3487 self._set_item(key, value) 3488 3489 def _setitem_slice(self, key, value):
~/miniconda3/envs/scenic_protocol/lib/python3.6/site-packages/pandas/core/frame.py in _set_item(self, key, value) 3561 """ 3562 -> 3563 self._ensure_valid_index(value) 3564 value = self._sanitize_column(key, value) 3565 NDFrame._set_item(self, key, value)
~/miniconda3/envs/scenic_protocol/lib/python3.6/site-packages/pandas/core/frame.py in _ensure_valid_index(self, value) 3541 except (ValueError, NotImplementedError, TypeError): 3542 raise ValueError( -> 3543 "Cannot set a frame with no defined index " 3544 "and a value that cannot be converted to a " 3545 "Series"
ValueError: Cannot set a frame with no defined index and a value that cannot be converted to a Series
It seems something wrong with the derive_regulons steps. Please help Many thanks!
I have the same problem
adding this option helped.
--mask_dropouts