seurat
seurat copied to clipboard
Issue with the merge function of Seurat handling objects with different sets of genes
Hi,
I believe there is an issue with the merge
function of Seurat
. The problem lies in the way Seurat
handles the feature.attributes
and scale.data
slots of two objects with different sets of expressed genes (though with a high overlap) on which Seurat::SCTransform()
was computed. Running the code in two different ways (but essentially identical in terms of the expected outcome) results in very different results:
- Merge the two
Seurat
objects then runSeurat::SCTransform()
(referred to as the "standardSeurat
version 5 workflow" in the code); - Run
Seurat::SCTransform()
then merge the objects (referred to as the "lapply approach" in the code).
For reproducibility, please check the code below:
# Required packages
require(Seurat)
# Load Broad Institute PBMC Systematic Comparative Analysis dataset
obj <- SeuratData::LoadData("pbmcsca")
# Subset two datasets and prepare data for processing
obj.l <- lapply(
X = c("smartseq2" = "Smart-seq2",
"celseq2" = "CEL-Seq2"),
FUN = function(method,
obj) {
# Retrieve counts data for dataset from each method
counts.mat <- GetAssayData(object = subset(x = obj,
subset = Method == method),
assay = "RNA",
layer = "counts")
# Remove all genes not expressed in at least 3 cells
genes.to.keep <- rowSums(x = counts.mat > 0) >= 3
counts.mat <- counts.mat[genes.to.keep,]
# Create a new Seurat object
obj <- CreateSeuratObject(
counts = counts.mat,
assay = "RNA",
min.cells = 0,
min.features = 0,
)
# Add method to new Seurat object
[email protected]$Method <- method
# Return object
return(obj)
},
obj = obj)
# Get maximum number of cells and genes
ncells <- Reduce(f = max, x = lapply(X = obj.l, FUN = ncol))
ngenes <- Reduce(f = max, x = lapply(X = obj.l, FUN = nrow))
# Prepare data for standard Seurat version 5 workflow
obj.merged.v5 <- merge(x = obj.l$smartseq2,
y = obj.l$celseq2,
add.cell.ids = NULL,
collapse = FALSE,
merge.data = TRUE,
merge.dr = FALSE)
# Run sctransform following standard Seurat version 5 workflow
obj.merged.v5 <- SCTransform(object = obj.merged.v5,
ncells = ncells,
variable.features.n = ngenes,
clip.range = c(-sqrt(x = ncells / 30),
sqrt(x = ncells / 30)), # same clipping range for the two approaches
n_genes = ngenes)
# Run sctransform using lapply
obj.l <- lapply(X = obj.l,
FUN = SCTransform,
ncells = ncells,
variable.features.n = ngenes,
clip.range = c(-sqrt(x = ncells / 30),
sqrt(x = ncells / 30)), # same clipping range for the two approaches
n_genes = ngenes)
# Merge Seurat objects
obj.merged.l5 <- merge(x = obj.l$smartseq2,
y = obj.l$celseq2,
add.cell.ids = NULL,
collapse = FALSE,
merge.data = TRUE,
merge.dr = FALSE)
# The feature.attributes slots do not match between the two approaches
mapply(v5 = obj.merged.v5@[email protected],
l5 = obj.merged.l5@[email protected],
FUN = function(v5,
l5) {
table([email protected] == [email protected][rownames(x = [email protected]),]) # rownames don't match either
},
SIMPLIFY = FALSE,
USE.NAMES = TRUE) # returns TRUE for the first dataset and TRUE and FALSE for the second
# The scale.data slots do not match either between the two approaches
lapply(X = 1:10, # check the first 10 genes
FUN = function(i,
v5,
l5) {
table(v5[i,] == l5[i,colnames(x = v5)])
},
v5 = obj.merged.v5@[email protected],
l5 = obj.merged.l5@[email protected])
# The feature.attributes slots match for the lapply approach before and after merging
mapply(l = obj.l,
l5 = obj.merged.l5@[email protected],
FUN = function(l,
l5) {
table(l@[email protected][email protected] == [email protected])
},
SIMPLIFY = FALSE,
USE.NAMES = TRUE) # returns TRUE for all datasets
# The scale.data slots match for the lapply approach before and after merging
lapply(X = obj.l,
FUN = function(l,
i,
l5) {
lapply(X = i,
FUN = function(i,
l,
l5) {
table(l@[email protected][i,] == l5[i,colnames(x = l@[email protected])])
},
l = l,
l5 = l5)
},
i = 1:10, # check the first 10 genes
l5 = obj.merged.l5@[email protected]) # returns TRUE for all genes
Here is also my sessionInfo()
:
> devtools::session_info()
─ Session info ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
setting value
version R version 4.3.1 (2023-06-16)
os macOS Monterey 12.7
system x86_64, darwin20
ui RStudio
language (EN)
collate en_US.UTF-8
ctype en_US.UTF-8
tz Europe/Zurich
date 2024-04-11
rstudio 2023.09.0+463 Desert Sunflower (desktop)
pandoc 3.1.1 @ /Applications/RStudio.app/Contents/Resources/app/quarto/bin/tools/ (via rmarkdown)
─ Packages ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
package * version date (UTC) lib source
abind 1.4-5 2016-07-21 [1] CRAN (R 4.3.0)
backports 1.4.1 2021-12-13 [1] CRAN (R 4.3.0)
base64enc 0.1-3 2015-07-28 [1] CRAN (R 4.3.0)
bbmle 1.0.25.1 2023-12-09 [1] CRAN (R 4.3.0)
bdsmatrix 1.3-7 2024-03-02 [1] CRAN (R 4.3.2)
bitops 1.0-7 2021-04-24 [1] CRAN (R 4.3.0)
cachem 1.0.8 2023-05-01 [1] CRAN (R 4.3.0)
caTools 1.18.2 2021-03-28 [1] CRAN (R 4.3.0)
checkmate 2.3.1 2023-12-04 [1] CRAN (R 4.3.0)
cli 3.6.2 2023-12-11 [1] CRAN (R 4.3.0)
cluster 2.1.6 2023-12-01 [1] CRAN (R 4.3.0)
clustree 0.5.1 2023-11-05 [1] CRAN (R 4.3.0)
codetools 0.2-20 2024-03-31 [1] CRAN (R 4.3.2)
colorspace 2.1-0 2023-01-23 [1] CRAN (R 4.3.0)
cowplot 1.1.3 2024-01-22 [1] CRAN (R 4.3.2)
curl 5.2.1 2024-03-01 [1] CRAN (R 4.3.2)
data.table 1.15.4 2024-03-30 [1] CRAN (R 4.3.2)
deldir 2.0-4 2024-02-28 [1] CRAN (R 4.3.2)
densEstBayes 1.0-2.2 2023-03-31 [1] CRAN (R 4.3.0)
devtools 2.4.5 2022-10-11 [1] CRAN (R 4.3.0)
digest 0.6.35 2024-03-11 [1] CRAN (R 4.3.2)
dotCall64 1.1-1 2023-11-28 [1] CRAN (R 4.3.0)
dplyr 1.1.4 2023-11-17 [1] CRAN (R 4.3.0)
ellipsis 0.3.2 2021-04-29 [1] CRAN (R 4.3.0)
evaluate 0.23 2023-11-01 [1] CRAN (R 4.3.0)
fansi 1.0.6 2023-12-08 [1] CRAN (R 4.3.0)
farver 2.1.1 2022-07-06 [1] CRAN (R 4.3.0)
fastDummies 1.7.3 2023-07-06 [1] CRAN (R 4.3.0)
fastmap 1.1.1 2023-02-24 [1] CRAN (R 4.3.0)
fitdistrplus 1.1-11 2023-04-25 [1] CRAN (R 4.3.0)
foreign 0.8-86 2023-11-28 [1] CRAN (R 4.3.0)
Formula 1.2-5 2023-02-24 [1] CRAN (R 4.3.0)
fs 1.6.3 2023-07-20 [1] CRAN (R 4.3.0)
future 1.33.2 2024-03-26 [1] CRAN (R 4.3.2)
future.apply 1.11.2 2024-03-28 [1] CRAN (R 4.3.2)
generics 0.1.3 2022-07-05 [1] CRAN (R 4.3.0)
ggforce 0.4.2 2024-02-19 [1] CRAN (R 4.3.2)
ggplot2 3.5.0 2024-02-23 [1] CRAN (R 4.3.2)
ggraph 2.2.1 2024-03-07 [1] CRAN (R 4.3.2)
ggrepel 0.9.5 2024-01-10 [1] CRAN (R 4.3.0)
ggridges 0.5.6 2024-01-23 [1] CRAN (R 4.3.2)
globals 0.16.3 2024-03-08 [1] CRAN (R 4.3.2)
glue 1.7.0 2024-01-09 [1] CRAN (R 4.3.0)
goftest 1.2-3 2021-10-07 [1] CRAN (R 4.3.0)
gplots 3.1.3.1 2024-02-02 [1] CRAN (R 4.3.2)
graphlayouts 1.1.1 2024-03-09 [1] CRAN (R 4.3.2)
gridExtra 2.3 2017-09-09 [1] CRAN (R 4.3.0)
gtable 0.3.4 2023-08-21 [1] CRAN (R 4.3.0)
gtools 3.9.5 2023-11-20 [1] CRAN (R 4.3.0)
Hmisc 5.1-2 2024-03-11 [1] CRAN (R 4.3.2)
htmlTable 2.4.2 2023-10-29 [1] CRAN (R 4.3.0)
htmltools 0.5.8.1 2024-04-04 [1] CRAN (R 4.3.1)
htmlwidgets 1.6.4 2023-12-06 [1] CRAN (R 4.3.0)
httpuv 1.6.15 2024-03-26 [1] CRAN (R 4.3.2)
httr 1.4.7 2023-08-15 [1] CRAN (R 4.3.0)
ica 1.0-3 2022-07-08 [1] CRAN (R 4.3.0)
igraph 2.0.3 2024-03-13 [1] CRAN (R 4.3.2)
inline 0.3.19 2021-05-31 [1] CRAN (R 4.3.0)
irlba 2.3.5.1 2022-10-03 [1] CRAN (R 4.3.0)
jsonlite 1.8.8 2023-12-04 [1] CRAN (R 4.3.0)
KernSmooth 2.23-22 2023-07-10 [1] CRAN (R 4.3.0)
knitr 1.45 2023-10-30 [1] CRAN (R 4.3.0)
later 1.3.2 2023-12-06 [1] CRAN (R 4.3.0)
lattice 0.22-6 2024-03-20 [1] CRAN (R 4.3.2)
lazyeval 0.2.2 2019-03-15 [1] CRAN (R 4.3.0)
leiden 0.4.3.1 2023-11-17 [1] CRAN (R 4.3.0)
lifecycle 1.0.4 2023-11-07 [1] CRAN (R 4.3.0)
listenv 0.9.1 2024-01-29 [1] CRAN (R 4.3.2)
lmtest 0.9-40 2022-03-21 [1] CRAN (R 4.3.0)
loo 2.7.0 2024-02-24 [1] CRAN (R 4.3.2)
M3Drop 3.10.6 2023-10-05 [1] Github (tallulandrews/M3Drop@9128921)
magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.3.0)
MASS 7.3-60.0.1 2024-01-13 [1] CRAN (R 4.3.0)
Matrix 1.6-5 2024-01-11 [1] CRAN (R 4.3.0)
matrixStats 1.2.0 2023-12-11 [1] CRAN (R 4.3.0)
memoise 2.0.1 2021-11-26 [1] CRAN (R 4.3.0)
mgcv 1.9-1 2023-12-21 [1] CRAN (R 4.3.0)
mime 0.12 2021-09-28 [1] CRAN (R 4.3.0)
miniUI 0.1.1.1 2018-05-18 [1] CRAN (R 4.3.0)
munsell 0.5.1 2024-04-01 [1] CRAN (R 4.3.2)
mvtnorm 1.2-4 2023-11-27 [1] CRAN (R 4.3.0)
nlme 3.1-164 2023-11-27 [1] CRAN (R 4.3.0)
nnet 7.3-19 2023-05-03 [1] CRAN (R 4.3.1)
numDeriv 2016.8-1.1 2019-06-06 [1] CRAN (R 4.3.0)
parallelly 1.37.1 2024-02-29 [1] CRAN (R 4.3.2)
patchwork 1.2.0 2024-01-08 [1] CRAN (R 4.3.0)
pbapply 1.7-2 2023-06-27 [1] CRAN (R 4.3.0)
pillar 1.9.0 2023-03-22 [1] CRAN (R 4.3.0)
pkgbuild 1.4.4 2024-03-17 [1] CRAN (R 4.3.2)
pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.3.0)
pkgload 1.3.4 2024-01-16 [1] CRAN (R 4.3.0)
plotly 4.10.4 2024-01-13 [1] CRAN (R 4.3.0)
plyr 1.8.9 2023-10-02 [1] CRAN (R 4.3.0)
png 0.1-8 2022-11-29 [1] CRAN (R 4.3.0)
polyclip 1.10-6 2023-09-27 [1] CRAN (R 4.3.0)
profvis 0.3.8 2023-05-02 [1] CRAN (R 4.3.0)
progressr 0.14.0 2023-08-10 [1] CRAN (R 4.3.0)
promises 1.2.1 2023-08-10 [1] CRAN (R 4.3.0)
purrr 1.0.2 2023-08-10 [1] CRAN (R 4.3.0)
QuickJSR 1.1.3 2024-01-31 [1] CRAN (R 4.3.2)
R6 2.5.1 2021-08-19 [1] CRAN (R 4.3.0)
RANN 2.6.1 2019-01-08 [1] CRAN (R 4.3.0)
RColorBrewer 1.1-3 2022-04-03 [1] CRAN (R 4.3.0)
Rcpp 1.0.12 2024-01-09 [1] CRAN (R 4.3.0)
RcppAnnoy 0.0.22 2024-01-23 [1] CRAN (R 4.3.2)
RcppHNSW 0.6.0 2024-02-04 [1] CRAN (R 4.3.2)
RcppParallel 5.1.7 2023-02-27 [1] CRAN (R 4.3.0)
reldist 1.7-2 2023-02-17 [1] CRAN (R 4.3.0)
remotes 2.5.0 2024-03-17 [1] CRAN (R 4.3.2)
reshape2 1.4.4 2020-04-09 [1] CRAN (R 4.3.0)
reticulate 1.35.0 2024-01-31 [1] CRAN (R 4.3.2)
rlang 1.1.3 2024-01-10 [1] CRAN (R 4.3.0)
rmarkdown 2.26 2024-03-05 [1] CRAN (R 4.3.2)
ROCR 1.0-11 2020-05-02 [1] CRAN (R 4.3.0)
rpart 4.1.23 2023-12-05 [1] CRAN (R 4.3.0)
RSpectra 0.16-1 2022-04-24 [1] CRAN (R 4.3.0)
rstan 2.32.6 2024-03-05 [1] CRAN (R 4.3.2)
rstantools 2.4.0 2024-01-31 [1] CRAN (R 4.3.2)
rstudioapi 0.16.0 2024-03-24 [1] CRAN (R 4.3.2)
Rtsne 0.17 2023-12-07 [1] CRAN (R 4.3.0)
scales 1.3.0 2023-11-28 [1] CRAN (R 4.3.0)
scattermore 1.2 2023-06-12 [1] CRAN (R 4.3.0)
sctransform 0.4.1 2023-10-19 [1] CRAN (R 4.3.0)
sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.3.0)
Seurat * 5.0.3 2024-03-18 [1] CRAN (R 4.3.2)
SeuratObject * 5.0.1 2023-11-17 [1] CRAN (R 4.3.0)
shiny 1.8.1.1 2024-04-02 [1] CRAN (R 4.3.1)
sp * 2.1-3 2024-01-30 [1] CRAN (R 4.3.2)
spam 2.10-0 2023-10-23 [1] CRAN (R 4.3.0)
spatstat.data 3.0-4 2024-01-15 [1] CRAN (R 4.3.0)
spatstat.explore 3.2-7 2024-03-21 [1] CRAN (R 4.3.2)
spatstat.geom 3.2-9 2024-02-28 [1] CRAN (R 4.3.2)
spatstat.random 3.2-3 2024-02-29 [1] CRAN (R 4.3.2)
spatstat.sparse 3.0-3 2023-10-24 [1] CRAN (R 4.3.0)
spatstat.utils 3.0-4 2023-10-24 [1] CRAN (R 4.3.0)
StanHeaders 2.32.6 2024-03-01 [1] CRAN (R 4.3.2)
statmod 1.5.0 2023-01-06 [1] CRAN (R 4.3.0)
stringi 1.8.3 2023-12-11 [1] CRAN (R 4.3.0)
stringr 1.5.1 2023-11-14 [1] CRAN (R 4.3.0)
survival 3.5-8 2024-02-14 [1] CRAN (R 4.3.2)
tensor 1.5 2012-05-05 [1] CRAN (R 4.3.0)
tibble 3.2.1 2023-03-20 [1] CRAN (R 4.3.0)
tidygraph 1.3.1 2024-01-30 [1] CRAN (R 4.3.2)
tidyr 1.3.1 2024-01-24 [1] CRAN (R 4.3.2)
tidyselect 1.2.1 2024-03-11 [1] CRAN (R 4.3.2)
tweenr 2.0.3 2024-02-26 [1] CRAN (R 4.3.2)
urlchecker 1.0.1 2021-11-30 [1] CRAN (R 4.3.0)
usethis 2.2.3 2024-02-19 [1] CRAN (R 4.3.2)
utf8 1.2.4 2023-10-22 [1] CRAN (R 4.3.0)
uwot 0.1.16 2023-06-29 [1] CRAN (R 4.3.0)
V8 4.4.2 2024-02-15 [1] CRAN (R 4.3.2)
vctrs 0.6.5 2023-12-01 [1] CRAN (R 4.3.0)
viridis 0.6.5 2024-01-29 [1] CRAN (R 4.3.2)
viridisLite 0.4.2 2023-05-02 [1] CRAN (R 4.3.0)
withr 3.0.0 2024-01-16 [1] CRAN (R 4.3.0)
xfun 0.43 2024-03-25 [1] CRAN (R 4.3.2)
xtable 1.8-4 2019-04-21 [1] CRAN (R 4.3.0)
zoo 1.8-12 2023-04-13 [1] CRAN (R 4.3.0)
[1] /Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/library
Best, Leon
Well, if you SCTransform each individual dataset and then merge it shouldnt have the same output as if you SCT all your datasets together. You have different data input when you conduct the normilization. Except if you merge but still individualy SCT each layer which should have the same output as before merging each one.
Hi @NeuralBind,
The code I provided above splits the merged object before computing Seurat::SCTransform()
. Therefore, Seurat::SCTransform()
should run on each dataset individually. These two approaches should yield the same results. And as a matter of fact, if the datasets contain the same genes, the results match between the two approaches. This issue is due to the merge
function of Seurat
mishandling data with different sets of genes.
Best, Leon
Well i first tried it couple day ago, i merged two SCT transformed datasets (it kept the each single layer SCT$count1, SCT$count2 etc.), but by the time you run SCT again it transforms the whole dataset (and not individualy) and overwrites with a single layer for the whole set. Which is something you want in order to visualize the variations of similar groups for different conditions etc. You can try to play a bit with the Join layers function (for joining or spliting) and look what SCT does to the merged set. In the vignettes this part was a bit trivial so i just trial an error to see the function. For the different gene set it shouldnt be a problem as it will add more genes but they will be no expressed in a part of your dataset, so they will be transformed based on that.
Hi @NeuralBind,
I think you misunderstood the problem. Let us wait for an answer from the Seurat team regarding this issue.
Best, Leon