COSGR icon indicating copy to clipboard operation
COSGR copied to clipboard

Accurate and fast cell marker gene identification with COSG

COSG in R

Accurate and fast cell marker gene identification with COSG

COSG is a cosine similarity-based method for more accurate and scalable marker gene identification.

  • COSG is a general method for cell marker gene identification across different data modalities, e.g., scRNA-seq, scATAC-seq and spatially resolved transcriptome data.
  • Marker genes or genomic regions identified by COSG are more indicative and with greater cell-type specificity.
  • COSG is ultrafast for large-scale datasets, and is capable of identifying marker genes for one million cells in less than two minutes.

The method and benchmarking results are described in Dai et al., (2022). The preprint is available in bioRxiv.

Here is the R version for COSG, and the python version is hosted in https://github.com/genecell/COSG.

Installation

# install.packages('remotes')
remotes::install_github(repo = 'genecell/COSGR')

Usage

Please check out the vignette and the PBMC10K tutorial to get started.

suppressMessages(library(Seurat))
data('pbmc_small',package='Seurat')
# Check cell groups:
table(Idents(pbmc_small))
#> 
#>  0  1  2 
#> 36 25 19 
#######
# Run COSG:
marker_cosg <- cosg(
 pbmc_small,
 groups='all',
 assay='RNA',
 slot='data',
 mu=1,
 n_genes_user=100)
#######
# Check the marker genes:
 head(marker_cosg$names)
#>       0      1     2
#> 1   CD7 S100A8 MS4A1
#> 2  CCL5   TYMP CD79A
#> 3  GNLY S100A9 TCL1A
#> 4 LAMP1  FCGRT  NT5C
#> 5  GZMA IFITM3 CD79B
#> 6   LCK   LST1 FCER2
 head(marker_cosg$scores)
#>           0         1         2
#> 1 0.6391917 0.8954042 0.6922908
#> 2 0.6391267 0.8312083 0.5832425
#> 3 0.6328148 0.8120045 0.5757478
#> 4 0.6164937 0.7755955 0.5533107
#> 5 0.5846589 0.7413060 0.5163446
#> 6 0.5795238 0.7380483 0.5115180
####### Run COSG for selected groups, i.e., '0' and 2':
#######
marker_cosg <- cosg(
 pbmc_small,
 groups=c('0', '2'),
 assay='RNA',
 slot='data',
 mu=1,
 n_genes_user=100)

Tip

  1. If you would like to identify more specific marker genes, you could assign mu to larger values, such as mu=10 or mu=100.
  2. You could set the parameter remove_lowly_expressed to TRUE to not consider genes expressed very lowly in the target cell group, and you can use the parameter expressed_pct to adjust the threshold for the percentage. For example:
marker_region<-cosg(
    seo,
  groups='all',
  assay='peaks',
  slot='data',
  mu=100,
  n_genes_user=100,
  remove_lowly_expressed=TRUE,
  expressed_pct=0.1
)

Citation

If COSG is useful for your research, please consider citing Dai, M., Pei, X., Wang, X.-J., 2022. Accurate and fast cell marker gene identification with COSG. Brief. Bioinform. bbab579.