STREAM icon indicating copy to clipboard operation
STREAM copied to clipboard

Error running detect_transition_genes()

Open smorabit opened this issue 6 years ago • 1 comments

I am using the STREAM python package, and I am having trouble running the detect_transition_genes() function on my data. I can process everything just fine up to the streamplots and the subwaymaps without any issues.

The following is the steps I take to process my data with STREAM:

  • I installed STREAM using bioconda as per the instructions on the STREAM github page.
  • Since my snRNA-seq data was generated using the 10X platform, I load my data as the matrix.mtx file from cellranger aggr as outlined in the scRNA-seq tutorial for STREAM.
  • I add meta-data table with clustering etc from Seurat
  • I performed a batch correction method on my gene expression data (iNMF, implemented in the R package Liger). So I stuck my iNMF matrix into the adata.obsm['top_pcs'] slot so I could run STREAM using that matrix.
  • Subset my anndata object for a specific cluster. This leaves me with about ~35k cells.
  • Normalize per cell, log transform, remove mt genes, filter genes using STREAM.
  • call st.dimension_reduction(adata, method='se', nb_pct=0.01, n_jobs=16, feature='top_pcs')
  • call st.seed_elastic_principal_graph(adata)
  • call st.elastic_principal_graph(adata)
  • call st.optimize_branching(adata, epg_trimmingradius=0.1)
  • call st.extend_elastic_principal_graph(adata ,epg_trimmingradius=0.1)
  • Finally plot the flat tree, streamplot, subwaymap etc all looking great, with branches corresponding well on a UMAP.
  • call st.detect_transistion_genes(adata, root='S4')

This is where I get the error:

Minimum number of cells expressing genes: 39
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/data/users/smorabit/bin/software/miniconda3/envs/stream/lib/python3.6/site-packages/stream/core.py", line 3974, in detect_transistion_genes
    input_genes_expressed = np.array(input_genes)[np.where((df_sc[input_genes]>0).sum(axis=0)>min_num_cells)[0]].tolist()
IndexError: index 59148 is out of bounds for axis 0 with size 58721

Interestingly, I tried running through the entire stream tutorial using the provided sample data (Nestorowa), and I did not run into the same error. Any ideas what is going on? Also, great work with this tool!

smorabit avatar Oct 18 '19 20:10 smorabit

Hi thanks for trying STREAM and your kind words!

It looks like the error was caused by the repetitive gene names in your matrix. Please see #14

The issue can be solved by making your gene names unique:

adata.var_names_make_unique
adata.raw = adata

huidongchen avatar Oct 21 '19 18:10 huidongchen