cellBrowser Load examples for scanpy are incorrect and misleading

Load examples for scanpy are incorrect and misleading

Open GeoffSCollins opened this issue 6 months ago • 0 comments

Within this file, it describes the following lines of code to load the downloaded data into a scanpy object:

import scanpy as sc
import pandas as pd
ad = sc.read_text("exprMatrix.tsv.gz")
meta = pd.read_csv("meta.tsv", sep="\t")
ad.var = meta

OR

import scanpy as sc
import pandas as pd
ad = sc.read_mtx("matrix.mtx.gz")
meta = pd.read_csv("meta.tsv", sep="\t")
ad.var = meta

I attempted to do this but ran into issues where cell and sample metadata was found in the var segment and genes were listed in the obs segment. This is described in a handful of places across scanpy and anndata, one of which is here.

After some investigation, I found that the expression matrix (exprMatrix.tsv.gz) I downloaded from cells.ucsc.edu was transposed, leading to this error. So, users such as myself should be instructed to transpose the matrix prior to loading it into scanpy.

I would make a PR for this repo, but it looks like I can't create a branch on the repo unless I fork it. So, below are the suggestions I would make to load.rst

Scanpy
^^^^^^

To create an anndata object in Scanpy if the expression matrix is a .tsv.gz file::

    import scanpy as sc
    import pandas as pd

    # transpose the downloaded expression matrix from cells.ucsc.edu
    data = pd.read_csv("exprMatrix.tsv.gz")
    
    # set the row index to be genes
    pd.set_index('gene', inplace=True)
    
    # transpose the matrix
    transposed_matrix = data.transpose()
    
    # write the transposed matrix to a file and then load into scanpy
    transposed_matrix.to_csv("transposed_matrix.tsv", sep="\t")
    
    ad = sc.read_text("transposed_matrix.tsv")

    # read the metadata and put it into the obs segment
    meta = pd.read_csv("meta.tsv", sep="\t")
    ad.obs = meta

If the expression matrix is an MTX file::

    import scanpy as sc
    import pandas as pd
    ad = sc.read_mtx("matrix.mtx.gz")
    meta = pd.read_csv("meta.tsv", sep="\t")
    ad.obs = meta

Jul 30 '24 17:07 GeoffSCollins

cellBrowser cellBrowser copied to clipboard

Load examples for scanpy are incorrect and misleading

cellBrowser
cellBrowser copied to clipboard