garnett icon indicating copy to clipboard operation
garnett copied to clipboard

Checking my markers for upcoming "Training my own classifier" steps

Open techgirl2022 opened this issue 2 years ago • 2 comments

I have a cds object file for which I want to do single cell RNA seq analysis (make UMAP and identify cell types) on using monocle and garnett software, I was trying to check what marker genes are in my cds file.

cds1 <- readRDS('example_cds.RDS') head(colData(cds1)) #shows annotations on each column as: cell (character), size factor (numeric), n.umi (numeric), perc_mitochondrial_umis (numeric), scrublet_score (numeric), scrublet_call (character), num_genes_expressed (integer) head(rowData(cds1)) #shows annotations on each row as: gene_short_name (character), id (character), chromosome (character), bp1 (integer), bp2 (integer), gene_strand (character), num_cells_expressed (integer)

library(org.Mm.eg.db) marker_file_path <- "C:/Users/[my username here]/Downloads/kidney_marker_genes.txt" marker_check <- check_markers(cds1, marker_file_path, db=org.Mm.eg.db, cds_gene_id_type = "SYMBOL", marker_file_gene_id_type = "SYMBOL")

plot_markers(marker_check)

However, I'm getting an error (see attached screenshot) check_markers_error

Any suggestions on what I should do to troubleshoot this step?

techgirl2022 avatar Aug 26 '22 21:08 techgirl2022

Hello, This error usually means that there's a mismatch between the format of the genes in the database versus in your cds object. You're using the Mm database, so just to check, your cds has genes (in the row.names of rowData) in standard mouse symbol format (e.g. Cd4)? And your marker file as well?

hpliner avatar Sep 03 '22 20:09 hpliner

Yes, I used the code head(rowData(cds)) to check the format of my genes in my cds object. It shows gene ID (ENSMUSG followed by 11 digits) and then gene_short_name in standard mouse symbol format (e.g. Gnai3), and for my marker file (which I manually made myself) I have it as:

#kidney_marker_genes.txt

Podocytes expressed: Nphs1, Nphs2, Synpo, Cdkn1c, Wt1, Cd2ap, Podxl

and the list continues for the rest of the cell types

techgirl2022 avatar Sep 08 '22 23:09 techgirl2022

Hello, sorry to butt in, but I think you should have your cell type started with >. Maybe it thinks podocytes is a gene and is trying to convert that to a gene name?

clee700 avatar Sep 21 '22 16:09 clee700

Hi, sorry for the late response. If you're using ensembl id for the cds object, you need to set cds_gene_id_type = "ENSEMBL" instead of SYMBOL. Reopen if this doesn't solve

hpliner avatar Oct 23 '22 12:10 hpliner