cellBrowser
cellBrowser copied to clipboard
Load examples for scanpy are incorrect and misleading
Within this file, it describes the following lines of code to load the downloaded data into a scanpy object:
import scanpy as sc
import pandas as pd
ad = sc.read_text("exprMatrix.tsv.gz")
meta = pd.read_csv("meta.tsv", sep="\t")
ad.var = meta
OR
import scanpy as sc
import pandas as pd
ad = sc.read_mtx("matrix.mtx.gz")
meta = pd.read_csv("meta.tsv", sep="\t")
ad.var = meta
I attempted to do this but ran into issues where cell and sample metadata was found in the var segment and genes were listed in the obs segment. This is described in a handful of places across scanpy and anndata, one of which is here.
After some investigation, I found that the expression matrix (exprMatrix.tsv.gz) I downloaded from cells.ucsc.edu was transposed, leading to this error. So, users such as myself should be instructed to transpose the matrix prior to loading it into scanpy.
I would make a PR for this repo, but it looks like I can't create a branch on the repo unless I fork it. So, below are the suggestions I would make to load.rst
Scanpy
^^^^^^
To create an anndata object in Scanpy if the expression matrix is a .tsv.gz file::
import scanpy as sc
import pandas as pd
# transpose the downloaded expression matrix from cells.ucsc.edu
data = pd.read_csv("exprMatrix.tsv.gz")
# set the row index to be genes
pd.set_index('gene', inplace=True)
# transpose the matrix
transposed_matrix = data.transpose()
# write the transposed matrix to a file and then load into scanpy
transposed_matrix.to_csv("transposed_matrix.tsv", sep="\t")
ad = sc.read_text("transposed_matrix.tsv")
# read the metadata and put it into the obs segment
meta = pd.read_csv("meta.tsv", sep="\t")
ad.obs = meta
If the expression matrix is an MTX file::
import scanpy as sc
import pandas as pd
ad = sc.read_mtx("matrix.mtx.gz")
meta = pd.read_csv("meta.tsv", sep="\t")
ad.obs = meta