clustifyr icon indicating copy to clipboard operation
clustifyr copied to clipboard

Using marker genes with different length

Open sofiapuvogelvittini opened this issue 2 years ago • 1 comments

Hello, thanks for developing this package. I am trying to compare my clusters with the clusters of a previously published scRNA seq dataset.

The reference clusters don't have the same number of marker genes, so I am filling with NA values the dataframe that contains all the marker genes per reference cluster. How clustify_lists() treat the NA values? May this affects my results? All the best and thanks for your time, Sof'ia

sofiapuvogelvittini avatar May 25 '22 10:05 sofiapuvogelvittini

Hi, you can pass reference markers as a list instead of same-length dataframe: Example below:

pbmc_markers as FindAllMarkers output gene list

pbmc_input <- split(pbmc_markers$gene, pbmc_markers$cluster)

reference gene list that is uneven length

pbmc_ref <- pos_neg_marker( list(B = c("CD79A", "CD79B", "MS4A1"), NK = c("GZMB", "GNLY")) )

reverse input and reference

res <- clustify_lists( pbmc_ref, pbmc_input, metric = "jaccard", input_markers = TRUE )

raysinensis avatar Jun 22 '22 15:06 raysinensis