SCP
SCP copied to clipboard
An end-to-end Single-Cell Pipeline designed to facilitate comprehensive analysis and exploration of single-cell data.
SCP: Single-Cell Pipeline
SCP provides a comprehensive set of tools for single-cell data processing and downstream analysis.
The package includes the following facilities:
- Integrated single-cell quality control methods.
- Pipelines embedded with multiple methods for normalization, feature reduction, and cell population identification (standard Seurat workflow).
- Pipelines embedded with multiple integration methods for scRNA-seq or scATAC-seq data, including Uncorrected, Seurat, scVI, MNN, fastMNN, Harmony, Scanorama, BBKNN, CSS, LIGER, Conos, ComBat.
- Multiple single-cell downstream analyses such as identification of differential features, enrichment analysis, GSEA analysis, identification of dynamic features, PAGA, RNA velocity, Palantir, Monocle2, Monocle3, etc.
- Multiple methods for automatic annotation of single-cell data and methods for projection between single-cell datasets.
- High-quality data visualization methods.
- Fast deployment of single-cell data into SCExplorer, a shiny app that provides an interactive visualization interface.
The functions in the SCP package are all developed around the Seurat object and are compatible with other Seurat functions.
R version requirement
- R >= 4.1.0
Installation in the global R environment
You can install the latest version of SCP from GitHub with:
if (!require("devtools", quietly = TRUE)) {
install.packages("devtools")
}
devtools::install_github("zhanghao-njmu/SCP")
Create a python environment for SCP
To run functions such as RunPAGA
or RunSCVELO
, SCP requires
conda to create a
separate python environment. The default environment name is
"SCP_env"
. You can specify the environment name for SCP by setting
options(SCP_env_name="new_name")
Now, you can run PrepareEnv()
to create the python environment for
SCP. If the conda binary is not found, it will automatically download
and install miniconda.
SCP::PrepareEnv()
To force SCP to use a specific conda binary, it is recommended to set
reticulate.conda_binary
R option:
options(reticulate.conda_binary = "/path/to/conda")
SCP::PrepareEnv()
If the download of miniconda or pip packages is slow, you can specify the miniconda repo and PyPI mirror according to your network region.
SCP::PrepareEnv(
miniconda_repo = "https://mirrors.bfsu.edu.cn/anaconda/miniconda",
pip_options = "-i https://pypi.tuna.tsinghua.edu.cn/simple"
)
Available miniconda repositories:
-
https://repo.anaconda.com/miniconda (default)
Available PyPI mirrors:
-
https://pypi.python.org/simple (default)
Installation in an isolated R environment using renv
If you do not want to change your current R environment or require reproducibility, you can use the renv package to install SCP into an isolated R environment.
Create an isolated R environment
if (!require("renv", quietly = TRUE)) {
install.packages("renv")
}
dir.create("~/SCP_env", recursive = TRUE) # It cannot be the home directory "~" !
renv::init(project = "~/SCP_env", bare = TRUE, restart = TRUE)
Option 1: Install SCP from GitHub and create SCP python environment
renv::activate(project = "~/SCP_env")
renv::install("BiocManager")
renv::install("zhanghao-njmu/SCP", repos = BiocManager::repositories())
SCP::PrepareEnv()
Option 2: If SCP is already installed in the global environment, copy SCP from the local library
renv::activate(project = "~/SCP_env")
renv::hydrate("SCP")
SCP::PrepareEnv()
Activate SCP environment first before use
renv::activate(project = "~/SCP_env")
library(SCP)
data("pancreas_sub")
pancreas_sub <- RunPAGA(srt = pancreas_sub, group_by = "SubCellType", linear_reduction = "PCA", nonlinear_reduction = "UMAP")
CellDimPlot(pancreas_sub, group.by = "SubCellType", reduction = "draw_graph_fr")
Save and restore the state of SCP environment
renv::snapshot(project = "~/SCP_env")
renv::restore(project = "~/SCP_env")
Quick Start
-
Data exploration
-
CellQC
-
Standard pipeline
-
Integration pipeline
-
Cell projection between single-cell datasets
-
Cell annotation using bulk RNA-seq datasets
-
Cell annotation using single-cell datasets
-
PAGA analysis
-
Velocity analysis
-
Differential expression analysis
-
Enrichment analysis(over-representation)
-
Enrichment analysis(GSEA)
-
Trajectory inference
-
Dynamic features
-
Interactive data visualization with SCExplorer
-
Other visualization examples
Data exploration
The analysis is based on a subsetted version of mouse pancreas data.
library(SCP)
library(BiocParallel)
register(MulticoreParam(workers = 8, progressbar = TRUE))
data("pancreas_sub")
print(pancreas_sub)
#> An object of class Seurat
#> 47874 features across 1000 samples within 3 assays
#> Active assay: RNA (15958 features, 3467 variable features)
#> 2 other assays present: spliced, unspliced
#> 2 dimensional reductions calculated: PCA, UMAP
CellDimPlot(
srt = pancreas_sub, group.by = c("CellType", "SubCellType"),
reduction = "UMAP", theme_use = "theme_blank"
)
data:image/s3,"s3://crabby-images/90e29/90e290a5c5c04886def79a3e2f21da2de77e282c" alt=""
CellDimPlot(
srt = pancreas_sub, group.by = "SubCellType", stat.by = "Phase",
reduction = "UMAP", theme_use = "theme_blank"
)
data:image/s3,"s3://crabby-images/fb1e4/fb1e41844ee89f5793fc189b08c3ce2fc5daf703" alt=""
FeatureDimPlot(
srt = pancreas_sub, features = c("Sox9", "Neurog3", "Fev", "Rbp4"),
reduction = "UMAP", theme_use = "theme_blank"
)
data:image/s3,"s3://crabby-images/f33cb/f33cb1d1bb1481c68d969530c09d9bf6c32cbea6" alt=""
FeatureDimPlot(
srt = pancreas_sub, features = c("Ins1", "Gcg", "Sst", "Ghrl"),
compare_features = TRUE, label = TRUE, label_insitu = TRUE,
reduction = "UMAP", theme_use = "theme_blank"
)
data:image/s3,"s3://crabby-images/5f382/5f3828d2ddc972e1347bb7370a349385cfbca777" alt=""
ht <- GroupHeatmap(
srt = pancreas_sub,
features = c(
"Sox9", "Anxa2", # Ductal
"Neurog3", "Hes6", # EPs
"Fev", "Neurod1", # Pre-endocrine
"Rbp4", "Pyy", # Endocrine
"Ins1", "Gcg", "Sst", "Ghrl" # Beta, Alpha, Delta, Epsilon
),
group.by = c("CellType", "SubCellType"),
heatmap_palette = "YlOrRd",
cell_annotation = c("Phase", "G2M_score", "Cdh2"),
cell_annotation_palette = c("Dark2", "Paired", "Paired"),
show_row_names = TRUE, row_names_side = "left",
add_dot = TRUE, add_reticle = TRUE
)
print(ht$plot)
data:image/s3,"s3://crabby-images/f005c/f005c6c3b9ee31095101177efc27bf2fc219f9ab" alt=""
CellQC
pancreas_sub <- RunCellQC(srt = pancreas_sub)
CellDimPlot(srt = pancreas_sub, group.by = "CellQC", reduction = "UMAP")
data:image/s3,"s3://crabby-images/b5d2a/b5d2a7f5637a696c4d0ad78970a1e706b357b592" alt=""
CellStatPlot(srt = pancreas_sub, stat.by = "CellQC", group.by = "CellType", label = TRUE)
data:image/s3,"s3://crabby-images/26170/2617077875dae9076784eab31b84599e938a1026" alt=""
CellStatPlot(
srt = pancreas_sub,
stat.by = c(
"db_qc", "outlier_qc", "umi_qc", "gene_qc",
"mito_qc", "ribo_qc", "ribo_mito_ratio_qc", "species_qc"
),
plot_type = "upset", stat_level = "Fail"
)
data:image/s3,"s3://crabby-images/c455a/c455a8097ae89e8ddf8b1fc06c4b9844edcc98f2" alt=""
Standard pipeline
pancreas_sub <- Standard_SCP(srt = pancreas_sub)
CellDimPlot(
srt = pancreas_sub, group.by = c("CellType", "SubCellType"),
reduction = "StandardUMAP2D", theme_use = "theme_blank"
)
data:image/s3,"s3://crabby-images/ad91b/ad91bb5dc159b5b04e72a04f8cd2b8ccfb878ef0" alt=""
CellDimPlot3D(srt = pancreas_sub, group.by = "SubCellType")
FeatureDimPlot3D(srt = pancreas_sub, features = c("Sox9", "Neurog3", "Fev", "Rbp4"))
Integration pipeline
Example data for integration is a subsetted version of panc8(eight human pancreas datasets)
data("panc8_sub")
panc8_sub <- Integration_SCP(srtMerge = panc8_sub, batch = "tech", integration_method = "Seurat")
CellDimPlot(
srt = panc8_sub, group.by = c("celltype", "tech"), reduction = "SeuratUMAP2D",
title = "Seurat", theme_use = "theme_blank"
)
data:image/s3,"s3://crabby-images/6928e/6928eebc0455d80a1cccd3d27dab717784df6635" alt=""
UMAP embeddings based on different integration methods in SCP:
Cell projection between single-cell datasets
panc8_rename <- RenameFeatures(
srt = panc8_sub,
newnames = make.unique(capitalize(rownames(panc8_sub[["RNA"]]), force_tolower = TRUE)),
assays = "RNA"
)
srt_query <- RunKNNMap(srt_query = pancreas_sub, srt_ref = panc8_rename, ref_umap = "SeuratUMAP2D")
ProjectionPlot(
srt_query = srt_query, srt_ref = panc8_rename,
query_group = "SubCellType", ref_group = "celltype"
)
data:image/s3,"s3://crabby-images/d3652/d36527ce70c8064d00fbb1c089df9353e0db2caa" alt=""
Cell annotation using bulk RNA-seq datasets
data("ref_scMCA")
pancreas_sub <- RunKNNPredict(srt_query = pancreas_sub, bulk_ref = ref_scMCA, filter_lowfreq = 20)
CellDimPlot(srt = pancreas_sub, group.by = "KNNPredict_classification", reduction = "UMAP", label = TRUE)
data:image/s3,"s3://crabby-images/4b0ad/4b0ad293894f272f33470e2331f316c2354b7555" alt=""
Cell annotation using single-cell datasets
pancreas_sub <- RunKNNPredict(
srt_query = pancreas_sub, srt_ref = panc8_rename,
ref_group = "celltype", filter_lowfreq = 20
)
CellDimPlot(srt = pancreas_sub, group.by = "KNNPredict_classification", reduction = "UMAP", label = TRUE)
data:image/s3,"s3://crabby-images/206a4/206a4a494b9bc00a29a3d94e2eb99fb44c4eb077" alt=""
pancreas_sub <- RunKNNPredict(
srt_query = pancreas_sub, srt_ref = panc8_rename,
query_group = "SubCellType", ref_group = "celltype",
return_full_distance_matrix = TRUE
)
CellDimPlot(srt = pancreas_sub, group.by = "KNNPredict_classification", reduction = "UMAP", label = TRUE)
data:image/s3,"s3://crabby-images/6de9a/6de9ae5a7d4640f4356e5f0214b44dbc782cd092" alt=""
ht <- CellCorHeatmap(
srt_query = pancreas_sub, srt_ref = panc8_rename,
query_group = "SubCellType", ref_group = "celltype",
nlabel = 3, label_by = "row",
show_row_names = TRUE, show_column_names = TRUE
)
print(ht$plot)
data:image/s3,"s3://crabby-images/8e8ac/8e8ac0418d6c7dee4111b90a2604b6f7514bcd03" alt=""
PAGA analysis
pancreas_sub <- RunPAGA(
srt = pancreas_sub, group_by = "SubCellType",
linear_reduction = "PCA", nonlinear_reduction = "UMAP"
)
PAGAPlot(srt = pancreas_sub, reduction = "UMAP", label = TRUE, label_insitu = TRUE, label_repel = TRUE)
data:image/s3,"s3://crabby-images/f9a7b/f9a7b99fc69eca1478fb3b478dc5afcee0f7fceb" alt=""
Velocity analysis
To estimate RNA velocity, you need to have both “spliced” and “unspliced” assays in your Seurat object. You can generate these matrices using velocyto, bustools, or alevin.
pancreas_sub <- RunSCVELO(
srt = pancreas_sub, group_by = "SubCellType",
linear_reduction = "PCA", nonlinear_reduction = "UMAP"
)
VelocityPlot(srt = pancreas_sub, reduction = "UMAP", group_by = "SubCellType")
data:image/s3,"s3://crabby-images/701ac/701ac496f00eb8921b56fdaf165e50240be6e071" alt=""
VelocityPlot(srt = pancreas_sub, reduction = "UMAP", plot_type = "stream")
data:image/s3,"s3://crabby-images/73a00/73a006502af078c51dec8e20cafd1c899dc0af0a" alt=""
Differential expression analysis
pancreas_sub <- RunDEtest(srt = pancreas_sub, group_by = "CellType", fc.threshold = 1, only.pos = FALSE)
VolcanoPlot(srt = pancreas_sub, group_by = "CellType")
data:image/s3,"s3://crabby-images/f0965/f0965bd59618603b873ed81ca6a3966d9f1fd12c" alt=""
DEGs <- pancreas_sub@tools$DEtest_CellType$AllMarkers_wilcox
DEGs <- DEGs[with(DEGs, avg_log2FC > 1 & p_val_adj < 0.05), ]
# Annotate features with transcription factors and surface proteins
pancreas_sub <- AnnotateFeatures(pancreas_sub, species = "Mus_musculus", db = c("TF", "CSPA"))
ht <- FeatureHeatmap(
srt = pancreas_sub, group.by = "CellType", features = DEGs$gene, feature_split = DEGs$group1,
species = "Mus_musculus", db = c("GO_BP", "KEGG", "WikiPathway"), anno_terms = TRUE,
feature_annotation = c("TF", "CSPA"), feature_annotation_palcolor = list(c("gold", "steelblue"), c("forestgreen")),
height = 5, width = 4
)
print(ht$plot)
data:image/s3,"s3://crabby-images/59363/5936307519b6809d4c038e213142318e6f873d97" alt=""
Enrichment analysis(over-representation)
pancreas_sub <- RunEnrichment(
srt = pancreas_sub, group_by = "CellType", db = "GO_BP", species = "Mus_musculus",
DE_threshold = "avg_log2FC > log2(1.5) & p_val_adj < 0.05"
)
EnrichmentPlot(
srt = pancreas_sub, group_by = "CellType", group_use = c("Ductal", "Endocrine"),
plot_type = "bar"
)
data:image/s3,"s3://crabby-images/96d49/96d4905927108348f7c225978514ac485c994190" alt=""
EnrichmentPlot(
srt = pancreas_sub, group_by = "CellType", group_use = c("Ductal", "Endocrine"),
plot_type = "wordcloud"
)
data:image/s3,"s3://crabby-images/8540a/8540aef7c5437f0b5c86da7d455d02713c9b5f12" alt=""
EnrichmentPlot(
srt = pancreas_sub, group_by = "CellType", group_use = c("Ductal", "Endocrine"),
plot_type = "wordcloud", word_type = "feature"
)
data:image/s3,"s3://crabby-images/6ccd8/6ccd85c9246d167139a1e24e886743ee1194c073" alt=""
EnrichmentPlot(
srt = pancreas_sub, group_by = "CellType", group_use = "Ductal",
plot_type = "network"
)
data:image/s3,"s3://crabby-images/cba38/cba387d5668a2962b931ea19369e492e0ae5aa1b" alt=""
To ensure that labels are visible, you can adjust the size of the viewer panel on Rstudio IDE.
EnrichmentPlot(
srt = pancreas_sub, group_by = "CellType", group_use = "Ductal",
plot_type = "enrichmap"
)
data:image/s3,"s3://crabby-images/b2991/b29913fe951486433937f567de8bc3183b45ff60" alt=""
EnrichmentPlot(srt = pancreas_sub, group_by = "CellType", plot_type = "comparison")
data:image/s3,"s3://crabby-images/07ea5/07ea51230c17bc2831450e95227f239fe2ea6921" alt=""
Enrichment analysis(GSEA)
pancreas_sub <- RunGSEA(
srt = pancreas_sub, group_by = "CellType", db = "GO_BP", species = "Mus_musculus",
DE_threshold = "p_val_adj < 0.05"
)
GSEAPlot(srt = pancreas_sub, group_by = "CellType", group_use = "Endocrine", id_use = "GO:0007186")
data:image/s3,"s3://crabby-images/ad9dd/ad9ddc9f9dca2e243ad852474d7085e875e7e84c" alt=""
GSEAPlot(
srt = pancreas_sub, group_by = "CellType", group_use = "Endocrine", plot_type = "bar",
direction = "both", topTerm = 20
)
data:image/s3,"s3://crabby-images/5fba3/5fba3082a009a5913bd7b2a2411323772dc71d26" alt=""
GSEAPlot(srt = pancreas_sub, group_by = "CellType", plot_type = "comparison")
data:image/s3,"s3://crabby-images/afcfe/afcfe13b8278b0d5894147b05d86e2fbc43424c0" alt=""
Trajectory inference
pancreas_sub <- RunSlingshot(srt = pancreas_sub, group.by = "SubCellType", reduction = "UMAP")
data:image/s3,"s3://crabby-images/c29cc/c29ccbb208c7cf0386c34f60abdac94f2c4161d7" alt=""
FeatureDimPlot(pancreas_sub, features = paste0("Lineage", 1:3), reduction = "UMAP", theme_use = "theme_blank")
data:image/s3,"s3://crabby-images/2010a/2010a1504211ad3895f89eb678e6d35cdc90e9f7" alt=""
CellDimPlot(pancreas_sub, group.by = "SubCellType", reduction = "UMAP", lineages = paste0("Lineage", 1:3), lineages_span = 0.1)
data:image/s3,"s3://crabby-images/a0ab7/a0ab76ab055fec368bd1f6c2b2453ed6cd09dcbf" alt=""
Dynamic features
pancreas_sub <- RunDynamicFeatures(srt = pancreas_sub, lineages = c("Lineage1", "Lineage2"), n_candidates = 200)
ht <- DynamicHeatmap(
srt = pancreas_sub, lineages = c("Lineage1", "Lineage2"),
use_fitted = TRUE, n_split = 6, reverse_ht = "Lineage1",
species = "Mus_musculus", db = "GO_BP", anno_terms = TRUE, anno_keys = TRUE, anno_features = TRUE,
heatmap_palette = "viridis", cell_annotation = "SubCellType",
separate_annotation = list("SubCellType", c("Nnat", "Irx1")), separate_annotation_palette = c("Paired", "Set1"),
feature_annotation = c("TF", "CSPA"), feature_annotation_palcolor = list(c("gold", "steelblue"), c("forestgreen")),
pseudotime_label = 25, pseudotime_label_color = "red",
height = 5, width = 2
)
print(ht$plot)
data:image/s3,"s3://crabby-images/a85a2/a85a25995dd741f3cac14c081a7d69470adb7150" alt=""
DynamicPlot(
srt = pancreas_sub, lineages = c("Lineage1", "Lineage2"), group.by = "SubCellType",
features = c("Plk1", "Hes1", "Neurod2", "Ghrl", "Gcg", "Ins2"),
compare_lineages = TRUE, compare_features = FALSE
)
data:image/s3,"s3://crabby-images/4b4d6/4b4d6e130b47b11154463ad7499e77987bb1b3ef" alt=""
FeatureStatPlot(
srt = pancreas_sub, group.by = "SubCellType", bg.by = "CellType",
stat.by = c("Sox9", "Neurod2", "Isl1", "Rbp4"), add_box = TRUE,
comparisons = list(
c("Ductal", "Ngn3 low EP"),
c("Ngn3 high EP", "Pre-endocrine"),
c("Alpha", "Beta")
)
)
data:image/s3,"s3://crabby-images/43596/4359674af1f2eaf4e2cfb09cec5a4485b19b3a4f" alt=""
Interactive data visualization with SCExplorer
PrepareSCExplorer(list(mouse_pancreas = pancreas_sub, human_pancreas = panc8_sub), base_dir = "./SCExplorer")
app <- RunSCExplorer(base_dir = "./SCExplorer")
list.files("./SCExplorer") # This directory can be used as site directory for Shiny Server.
if (interactive()) {
shiny::runApp(app)
}
Other visualization examples
CellDimPlot
CellStatPlot
FeatureStatPlot
GroupHeatmap
You can also find more examples in the documentation of the function: Integration_SCP, RunKNNMap, RunMonocle3, RunPalantir, etc.