Error running detect_transition_genes()
I am using the STREAM python package, and I am having trouble running the detect_transition_genes() function on my data. I can process everything just fine up to the streamplots and the subwaymaps without any issues.
The following is the steps I take to process my data with STREAM:
- I installed STREAM using bioconda as per the instructions on the STREAM github page.
- Since my snRNA-seq data was generated using the 10X platform, I load my data as the matrix.mtx file from cellranger aggr as outlined in the scRNA-seq tutorial for STREAM.
- I add meta-data table with clustering etc from Seurat
- I performed a batch correction method on my gene expression data (iNMF, implemented in the R package Liger). So I stuck my iNMF matrix into the
adata.obsm['top_pcs']slot so I could run STREAM using that matrix. - Subset my anndata object for a specific cluster. This leaves me with about ~35k cells.
- Normalize per cell, log transform, remove mt genes, filter genes using STREAM.
- call
st.dimension_reduction(adata, method='se', nb_pct=0.01, n_jobs=16, feature='top_pcs') - call
st.seed_elastic_principal_graph(adata) - call
st.elastic_principal_graph(adata) - call
st.optimize_branching(adata, epg_trimmingradius=0.1) - call
st.extend_elastic_principal_graph(adata ,epg_trimmingradius=0.1) - Finally plot the flat tree, streamplot, subwaymap etc all looking great, with branches corresponding well on a UMAP.
- call
st.detect_transistion_genes(adata, root='S4')
This is where I get the error:
Minimum number of cells expressing genes: 39
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/data/users/smorabit/bin/software/miniconda3/envs/stream/lib/python3.6/site-packages/stream/core.py", line 3974, in detect_transistion_genes
input_genes_expressed = np.array(input_genes)[np.where((df_sc[input_genes]>0).sum(axis=0)>min_num_cells)[0]].tolist()
IndexError: index 59148 is out of bounds for axis 0 with size 58721
Interestingly, I tried running through the entire stream tutorial using the provided sample data (Nestorowa), and I did not run into the same error. Any ideas what is going on? Also, great work with this tool!
Hi thanks for trying STREAM and your kind words!
It looks like the error was caused by the repetitive gene names in your matrix. Please see #14
The issue can be solved by making your gene names unique:
adata.var_names_make_unique
adata.raw = adata