Error in executing si.tl.find_master_regulators: KeyError: 'SREBF1'
I am analyzing self-collected datasets of single-cell scRNAseq and single-cell ATACseq. The two datasets were obtained separately. I integrated single-cell transcriptomic data and single-cell ATAC data following the workflow provided by multimodal analysis. Subsequently, I obtained several files, namely adata_G, adata_M, adata_all, adata_cmp_CG, and adata_cmp_CM. I then executed the following code:
> motifs_genes = pd.DataFrame(columns=['motif', 'gene'])
> for x in adata_M.obs_names:
> x_split = x.split('_')
> for y in adata_G.obs_names:
> if y in x_split:
> motifs_genes.loc[motifs_genes.shape[0]] = [x,y]
>
> motifs_genes
> duplicates = motifs_genes['motif'].duplicated()
> motifs_genes[duplicates]
>
> print(motifs_genes.shape)
> motifs_genes.head()
>
> motifs_genes_no_duplicates = motifs_genes.drop_duplicates(subset=['motif'])
>
>
> list_tf_motif = motifs_genes_no_duplicates ['motif'].tolist()
> list_tf_gene = motifs_genes_no_duplicates ['gene'].tolist()
>
> df_metrics_motif = adata_cmp_CM.var.copy()
> df_metrics_gene = adata_cmp_CG.var.copy()
>
> df_metrics_motif.head()
> df_metrics_gene.head()
>
> si.pl.entity_metrics(adata_cmp_CG,x='max',y='gini',
> show_texts=False,
> show_cutoff=True,
> show_contour=True,
> c='#607e95',
> cutoff_x=1.5,
> cutoff_y=0.35)
>
>
>
>
> len(list_tf_motif)
> len(list_tf_gene)
>
>
>
> df_MR = si.tl.find_master_regulators(adata_all,
> list_tf_motif=list_tf_motif,
> list_tf_gene=list_tf_gene,
> cutoff_gene_max=1.5,
> cutoff_gene_gini=0.35,
> cutoff_motif_max=3,
> cutoff_motif_gini=0.7,
> metrics_gene=df_metrics_gene,
> metrics_motif=df_metrics_motif
> )
>
> adata_all.obs
The following error occurred while running si.tl.find_master_regulators:
Traceback (most recent call last):
File "/root/anaconda3/envs/env_simba/lib/python3.10/site-packages/pandas/core/indexes/base.py", line 3791, in get_loc
return self._engine.get_loc(casted_key)
File "index.pyx", line 152, in pandas._libs.index.IndexEngine.get_loc
File "index.pyx", line 181, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 7080, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 7088, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'SREBF1'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/root/anaconda3/envs/env_simba/lib/python3.10/site-packages/IPython/core/interactiveshell.py", line 3553, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-235-6901c5c76ad1>", line 1, in <module>
df_MR = si.tl.find_master_regulators(adata_all,
File "/root/anaconda3/envs/env_simba/lib/python3.10/site-packages/simba/tools/_post_training.py", line 618, in find_master_regulators
df_MR.loc[i, 'rank'] = dist_MG.loc[x_motif, ].rank()[x_gene]
File "/root/anaconda3/envs/env_simba/lib/python3.10/site-packages/pandas/core/series.py", line 1040, in __getitem__
return self._get_value(key)
File "/root/anaconda3/envs/env_simba/lib/python3.10/site-packages/pandas/core/series.py", line 1156, in _get_value
loc = self.index.get_loc(label)
File "/root/anaconda3/envs/env_simba/lib/python3.10/site-packages/pandas/core/indexes/base.py", line 3798, in get_loc
raise KeyError(key) from err
KeyError: 'SREBF1'
Additionally, adata.PM.var_names is very strange; it doesn't consist of motifs but rather a list of genes. When running the scATAC-seq process, I used the hg38 annotation, and therefore, I also used the hg38 reference genome in Simba. Does this have any impact?
adata_PM.var_names
Index([ b'FOXF2', b'FOXD1', b'IRF2', b'MZF1(var.2)',
b'MAX_MYC', b'PPARG', b'PAX6', b'PBX1',
b'RORA', b'RORA(var.2)',
...
b'TEAD1', b'TEAD4', b'TFAP2A', b'TFAP2C(var.2)',
b'TWIST1', b'USF1', b'USF2', b'YY2',
b'ZNF263', b'CREM'],
dtype='object', length=633)
In addition, the 'chr', 'start', and 'end' columns in adata_CP.var are derived by splitting the row names of the peak matrix output from Cell Ranger, as shown below.
chr_list, start_list, end_list = [], [], []
for var_name in adata_CP.var_names:
parts = var_name.split('-')
chr_list.append(parts[0])
start_list.append(parts[1])
end_list.append(parts[2])
len(adata_CP.var_names)
len(chr_list)
chr_df = pd.DataFrame({'chr': chr_list}, index=adata_CP.var_names)
adata_CP.var[['chr']] = chr_df
start_df = pd.DataFrame({'start': start_list}, index=adata_CP.var_names)
adata_CP.var[['start']] = start_df
end_df = pd.DataFrame({'start':end_list}, index=adata_CP.var_names)
adata_CP.var[['end']] = end_df