ArchR
ArchR copied to clipboard
Error in addGeneIntegrationMatrix / abnormally high number of communities
When executing the code provided, the following error occurs:
> mRCCv5
An object of class Seurat
22921 features across 102723 samples within 1 assay
Active assay: RNA (22921 features, 2000 variable features)
2 layers present: counts, data
> proj
___ .______ ______ __ __ .______
/ \ | _ \ / || | | | | _ \
/ ^ \ | |_) | | ,----'| |__| | | |_) |
/ /_\ \ | / | | | __ | | /
/ _____ \ | |\ \\___ | `----.| | | | | |\ \\___.
/__/ \__\ | _| `._____| \______||__| |__| | _| `._____|
class: ArchRProject
outputDirectory: atac_RCC100
samples(1): RCC100_ver2
sampleColData names(1): ArrowFiles
cellColData names(16): Sample TSSEnrichment ... BlacklistRatio
Clusters
numberOfCells(1): 44844
medianTSS(1): 6.249
medianFrags(1): 3221
> proj <- addGeneIntegrationMatrix(
+ ArchRProj = proj,
+ useMatrix = "GeneScoreMatrix",
+ matrixName = "GeneIntegrationMatrix",
+ reducedDims = "IterativeLSI",
+ seRNA = mRCCv5,
+ sampleCellsRNA = 5000,
+ addToArrow = F,
+ groupRNA = "BioClassification",
+ nameCell = "predictedCell_Un",
+ nameGroup = "predictedGroup_Un",
+ nameScore = "predictedScore_Un",
+ force = T
+ )
The single-cell ATAC-seq dataset consists of multiple samples, but the above function was executed only on a single sample, and even when performed on an ArchRProject that combines multiple samples, the same error occurs. I have also tried switching to different Seurat assay versions such as v3, v5, but the same error persists.
ArchR logging to : ArchRLogs/ArchR-addGeneIntegrationMatrix-104ce73bf6ad1-Date-2023-07-28_Time-01-19-53.700155.log
If there is an issue, please report to github with logFile!
2023-07-28 01:19:54.010202 : Running Seurat's Integration Stuart* et al 2019, 0.005 mins elapsed.
2023-07-28 01:19:54.041649 : Checking ATAC Input, 0.006 mins elapsed.
2023-07-28 01:19:59.577644 : Checking RNA Input, 0.098 mins elapsed.
Warning: The following arguments are not used: layer
2023-07-28 01:20:16.147696 : Found 17989 overlapping gene names from gene scores and rna matrix!, 0.374 mins elapsed.
2023-07-28 01:20:16.14888 : Creating Integration Blocks, 0.374 mins elapsed.
2023-07-28 01:20:18.104443 : Prepping Interation Data, 0.407 mins elapsed.
2023-07-28 01:20:18.623075 : Computing Integration in 5 Integration Blocks!, 0 mins elapsed.
Error in .safelapply(seq_along(blockList), function(i) { :
Error Found Iteration 1 :
[1] "Error in slot(object = object, name = \"features\")[[layer]] <- features : \n more elements supplied than there are to replace\n"
<simpleError in slot(object = object, name = "features")[[layer]] <- features: more elements supplied than there are to replace>
Error Found Iteration 2 :
[1] "Error in slot(object = object, name = \"features\")[[layer]] <- features : \n more elements supplied than there are to replace\n"
<simpleError in slot(object = object, name = "features")[[layer]] <- features: more elements supplied than there are to replace>
Error Found Iteration 3 :
[1] "Error in slot(object = object, name = \"features\")[[layer]] <- features : \n more elements supplied than there are to replace\n"
<simpleError in slot(object = object, name = "features")[[layer]] <- features: more elements supplied than there are to replace>
Error Found Iteration 4 :
[1] "Error in slot(object = object, name = \"features\")[[layer]] <- features :
In addition: Warning message:
In mclapply(..., mc.cores = threads, mc.preschedule = preschedule) :
5 function calls resulted in an error
Below is the result of executing sessioninfo()
:
R version 4.3.0 (2023-04-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 22.04.2 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so; LAPACK version 3.10.0
Random number generation:
RNG: L'Ecuyer-CMRG
Normal: Inversion
Sample: Rejection
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8
[4] LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
time zone: Etc/UTC
tzcode source: system (glibc)
attached base packages:
[1] parallel stats4 grid stats graphics grDevices utils datasets methods
[10] base
other attached packages:
[1] hexbin_1.28.3 nabor_0.5.0
[3] uwot_0.1.16 BSgenome.Hsapiens.UCSC.hg38_1.4.5
[5] BSgenome_1.68.0 rtracklayer_1.60.0
[7] Biostrings_2.68.1 XVector_0.40.0
[9] Seurat_4.9.9.9049 SeuratObject_4.9.9.9085
[11] sp_1.6-1 trqwe_0.1
[13] rhdf5_2.44.0 SummarizedExperiment_1.30.1
[15] Biobase_2.60.0 MatrixGenerics_1.12.0
[17] Rcpp_1.0.11 Matrix_1.6-0
[19] GenomicRanges_1.52.0 GenomeInfoDb_1.36.0
[21] IRanges_2.34.0 S4Vectors_0.38.1
[23] BiocGenerics_0.46.0 matrixStats_1.0.0
[25] data.table_1.14.8 stringr_1.5.0
[27] plyr_1.8.8 magrittr_2.0.3
[29] ggplot2_3.4.2 gtable_0.3.3
[31] gtools_3.9.4 gridExtra_2.3
[33] ArchR_1.0.2
loaded via a namespace (and not attached):
[1] RColorBrewer_1.1-3 rstudioapi_0.15.0 jsonlite_1.8.5
[4] spatstat.utils_3.0-3 farver_2.1.1 BiocIO_1.10.0
[7] zlibbioc_1.46.0 vctrs_0.6.3 ROCR_1.0-11
[10] Rsamtools_2.16.0 Cairo_1.6-0 spatstat.explore_3.2-1
[13] RCurl_1.98-1.12 htmltools_0.5.5 S4Arrays_1.0.4
[16] Rhdf5lib_1.22.0 sctransform_0.3.5 parallelly_1.36.0
[19] KernSmooth_2.23-22 htmlwidgets_1.6.2 ica_1.0-3
[22] plotly_4.10.2 zoo_1.8-12 GenomicAlignments_1.36.0
[25] igraph_1.5.0 mime_0.12 lifecycle_1.0.3
[28] pkgconfig_2.0.3 R6_2.5.1 fastmap_1.1.1
[31] GenomeInfoDbData_1.2.10 fitdistrplus_1.1-11 future_1.33.0
[34] shiny_1.7.4.1 digest_0.6.33 colorspace_2.1-0
[37] patchwork_1.1.2 tensor_1.5 RSpectra_0.16-1
[40] irlba_2.3.5.1 labeling_0.4.2 progressr_0.13.0
[43] fansi_1.0.4 spatstat.sparse_3.0-2 polyclip_1.10-4
[46] httr_1.4.6 abind_1.4-5 compiler_4.3.0
[49] withr_2.5.0 BiocParallel_1.34.1 fastDummies_1.6.3
[52] MASS_7.3-60 DelayedArray_0.26.2 rjson_0.2.21
[55] tools_4.3.0 lmtest_0.9-40 httpuv_1.6.11
[58] future.apply_1.11.0 goftest_1.2-3 glue_1.6.2
[61] restfulr_0.0.15 nlme_3.1-162 rhdf5filters_1.12.1
[64] promises_1.2.0.1 Rtsne_0.16 cluster_2.1.4
[67] reshape2_1.4.4 generics_0.1.3 spatstat.data_3.0-1
[70] tidyr_1.3.0 utf8_1.2.3 spatstat.geom_3.2-4
[73] RcppAnnoy_0.0.21 ggrepel_0.9.3 RANN_2.6.1
[76] pillar_1.9.0 spam_2.9-1 RcppHNSW_0.4.1
[79] later_1.3.1 splines_4.3.0 dplyr_1.1.2
[82] lattice_0.21-8 deldir_1.0-9 survival_3.5-5
[85] tidyselect_1.2.0 miniUI_0.1.1.1 pbapply_1.7-2
[88] scattermore_1.2 stringi_1.7.12 yaml_2.3.7
[91] lazyeval_0.2.2 codetools_0.2-19 tibble_3.2.1
[94] cli_3.6.1 xtable_1.8-4 reticulate_1.30
[97] munsell_0.5.0 spatstat.random_3.1-5 globals_0.16.2
[100] png_0.1-8 XML_3.99-0.14 ellipsis_0.3.2
[103] dotCall64_1.0-2 bitops_1.0-7 listenv_0.9.0
[106] viridisLite_0.4.2 scales_1.2.1 ggridges_0.5.4
[109] leiden_0.4.3 purrr_1.0.1 crayon_1.5.2
[112] rlang_1.1.1 cowplot_1.1.1
During the execution of doubletScoring, the following message was displayed, and when performing clustering, the number of communities was abnormally high. For the given sample, 130 communities were identified, and 117 singletons were merged into one cluster. In other datasets with 19 samples, the number of communities showed 10365, and even after 13 hours, the addCluster function is still running.
Correlation of UMAP Projection is below 0.9 (normally this is ~0.99)
This means there is little heterogeneity in your sample, and thus doubletCalling is inaccurate. force = FALSE, thus returning -1 doubletScores and doubletEnrichments!
The purpose of this issue is to resolve the error in 'addGeneIntegrationMatrix'. Additionally, I would appreciate it if you could suggest appropriate options related to the 'addDoubletScores' function. Thank you.
Hi @Khreat0205! Thanks for using ArchR! Please make sure that your post belongs in the Issues section. Only bugs and error reports belong in the Issues section. Usage questions and feature requests should be posted in the Discussions section, not in Issues.
It is worth noting that there are very few actual bugs in ArchR. If you are getting an error, it is probably something specific to your dataset, usage, or computational environment, all of which are extremely challenging to troubleshoot. As such, we require reproducible examples (preferably using the tutorial dataset) from users who want assistance. If you cannot reproduce your error, we will not be able to help.
Before going through the work of making a reproducible example, search the previous Issues, Discussions, function definitions, or the ArchR manual and you will likely find the answers you are looking for.
If your post does not contain a reproducible example, it is unlikely to receive a response.
__In addition to a reproducible example, you must do the following things before we help you, unless your original post already contained this information:
1. If you've encountered an error, have you already searched previous Issues to make sure that this hasn't already been solved?
2. Did you post your log file? If not, add it now.
3. Remove any screenshots that contain text and instead copy and paste the text using markdown's codeblock syntax (three consecutive backticks). You can do this by editing your original post.
I am encountering an error when using the addGeneIntegrationMatrix function. Here is the error message I am getting:
Error in .safelapply(seq_along(blockList), function(i) { :
Error Found Iteration 1 :
[1] "Error in slot(object = object, name = \"features\")[[layer]] <- features : \n more elements supplied than there are to replace\n"
<simpleError in slot(object = object, name = "features")[[layer]] <- features: more elements supplied than there are to replace>
Error Found Iteration 2 :
[1] "Error in slot(object = object, name = \"features\")[[layer]] <- features : \n more elements supplied than there are to replace\n"
<simpleError in slot(object = object, name = "features")[[layer]] <- features: more elements supplied than there are to replace>
Error Found Iteration 3 :
[1] "Error in slot(object = object, name = \"features\")[[layer]] <- features : \n more elements supplied than there are to replace\n"
<simpleError in slot(object = object, name = "features")[[layer]] <- features: more elements supplied than there are to replace>
Error Found Iteration 4 :
[1] "Error in slot(object = object, name = \"features\")[[layer]] <- features :
In addition: Warning message:
To help you understand the full context of what I was doing when this error occurred, I have attached the log file from that time. Please note that I've looked through all previous issues related to this but couldn't find anything that might help solve my problem.
If you need more information about this error, please let me know. I am eager to resolve this issue as soon as possible.
Thank you.
any update on this issue?
I recently encountered a similar message:
Error in slot(object = object, name = "cells")[[layer]] <- cells :
more elements supplied than there are to replace
The only difference is that I have problem with cell names but not feature names. I also found that I only have this problem in the last integration block, but not the preceding ones. So to figure out why, I decided to run script pf addGeneIntegrationMatrix
line by line. After a lot of digging around, I found that the reason for my error message is that the column names of this matrix is an array!!
mat <- ArchR:::.getPartialMatrix(getArrowFiles(subProj), featureDF = geneDF[geneDF$name %in% genesUse, ],
threads = 2, cellNames = subProj$cellNames, useMatrix = useMatrix, verbose = FALSE)
and
mat <- suppressMessages(imputeMatrix(mat = mat, imputeWeights = getImputeWeights(subProj), verbose = FALSE))
> typeof(colnames(mat[1:3,1:3]))
[1] "character"
> class(colnames(mat[1:3,1:3]))
[1] "array"
Converting them to character colnames(mat) <- as.character(colnames(mat))
solves the problem.
But there's still two things that are strange:
- This only happened in my most recent analysis. Previous analysis are totally fine. I do not recall updating anything recently.
- This only happened to the cells in the last integration, no matter how many blocks there are.
When I checked the
mat
from the first block:
> class(colnames(mat))
[1] "character"
> typeof(colnames(mat))
[1] "character"
I have no clue why these two things happen. But hopefully this is helpful.
ArchR version: ArchR * 1.0.3 2023-02-18 [1] Github (GreenleafLab/ArchR@55ab73e)
it also seems that in this case, "character" colnames were converted to "array" in imputeMatrix()
step.
First command isn’t working in my case
which one?
mat <- ArchR:::.getPartialMatrix(getArrowFiles(subProj), featureDF = geneDF[geneDF$name %in% genesUse, ], threads = 2, cellNames = subProj$cellNames, useMatrix = useMatrix, verbose = FALSE)
Would you mind to share your full code if possible? I appreciate your willingness to help. Thanks
I see. I think you might see the error because this is extracted from the function, so you cannot just run it independently without getting the intermediate inputs. Here is the full function. Note two things though: (1) this function is just the addGeneIntegration
function, I only added line 225, 226 in this function; (2) this function does not generate a log file, because logging functions in ArchR are not exported so it's a bit difficult to work with outside of ArchR; you will have to wait until the whole function finishes to see if you get the results in your ArchR project.
testIntegration <- function (ArchRProj = NULL, useMatrix = "GeneScoreMatrix", matrixName = "GeneIntegrationMatrix",
reducedDims = "IterativeLSI", seRNA = NULL, groupATAC = NULL,
groupRNA = NULL, groupList = NULL, sampleCellsATAC = 10000,
sampleCellsRNA = 10000, embeddingATAC = NULL, embeddingRNA = NULL,
dimsToUse = 1:30, scaleDims = NULL, corCutOff = 0.75, plotUMAP = TRUE,
UMAPParams = list(n_neighbors = 40, min_dist = 0.4, metric = "cosine",
verbose = FALSE), nGenes = 2000, useImputation = TRUE,
reduction = "cca", addToArrow = TRUE, scaleTo = 10000, genesUse = NULL,
nameCell = "predictedCell", nameGroup = "predictedGroup",
nameScore = "predictedScore", transferParams = list(), threads = getArchRThreads(),
verbose = TRUE, force = FALSE, logFile = createLogFile("addGeneIntegrationMatrix"),
...)
{
if (is.null(groupList)) {
groupList <- SimpleList()
groupList[[1]] <- SimpleList(ATAC = ArchRProj$cellNames, RNA = colnames(seRNA))
}
if (useMatrix %ni% getAvailableMatrices(ArchRProj)) {
stop("Matrix name provided to useMatrix does not exist in ArchRProject!")
}
if (!is.null(groupATAC)) {
dfATAC <- getCellColData(ArchRProj = ArchRProj, select = groupATAC, drop = FALSE)
}
nCell <- rep(0, length(ArchRProj$cellNames))
names(nCell) <- ArchRProj$cellNames
groupList <- lapply(seq_along(groupList), function(x) {
ATAC <- groupList[[x]]$ATAC
if (!is.null(groupATAC)) {
if (any(ATAC %in% dfATAC[, 1])) {
idx <- which(ATAC %in% dfATAC[, 1])
ATAC2 <- rownames(dfATAC)[which(dfATAC[, 1] %in% ATAC[idx])]
if (length(idx) == length(ATAC)) {
ATAC <- ATAC2
}
else {
ATAC <- c(ATAC[-idx], ATAC2)
}
}
}
SimpleList(ATAC = ATAC, RNA = groupList[[x]]$RNA)
}) %>% SimpleList
for (i in seq_along(groupList)) {
nCell[groupList[[i]]$ATAC] <- nCell[groupList[[i]]$ATAC] + 1
}
if (!all(nCell == 1)) {
stop("Missing ", length(which(nCell == 0)), " cells. Found ",
length(which(nCell > 1)), " overlapping cells from ArchRProj in groupList! Cannot have overlapping/missing cells in ATAC input, check 'groupList' argument!")
}
if (inherits(seRNA, "SummarizedExperiment")) {
seuratRNA <- CreateSeuratObject(counts = assay(seRNA))
if (groupRNA %ni% colnames(colData(seRNA))) {
stop("groupRNA not in colData of seRNA")
}
seuratRNA$Group <- paste0(colData(seRNA)[, groupRNA, drop = TRUE])
rm(seRNA)
} else {
if (groupRNA %ni% colnames([email protected])) {
stop("groupRNA not in meta.data of Seurat Object")
}
seuratRNA <- seRNA
seuratRNA$Group <- paste0([email protected][, groupRNA])
rm(seRNA)
}
if ("RNA" %in% names(seuratRNA@assays)) {
DefaultAssay(seuratRNA) <- "RNA"
} else {
stop("'RNA' is not present in Seurat Object's Assays! Please make sure that this assay is present!")
}
gc()
if (!is.null(groupRNA)) {
dfRNA <- DataFrame(row.names = colnames(seuratRNA), Group = seuratRNA$Group)
}
groupList <- lapply(seq_along(groupList), function(x) {
RNA <- groupList[[x]]$RNA
if (!is.null(groupRNA)) {
if (any(RNA %in% dfRNA[, 1])) {
idx <- which(RNA %in% dfRNA[, 1])
RNA2 <- rownames(dfRNA)[which(dfRNA[, 1] %in% RNA[idx])]
if (length(idx) == length(RNA)) {
RNA <- RNA2
}
else {
RNA <- c(RNA[-idx], RNA2)
}
}
}
SimpleList(ATAC = groupList[[x]]$ATAC, RNA = RNA)
}) %>% SimpleList
cellRNA <- unlist(lapply(groupList, function(x) x$RNA))
if (!all(cellRNA %in% colnames(seuratRNA))) {
stop("Found cells for RNA not in colnames(seRNA)! Please retry your input!")
}
seuratRNA <- seuratRNA[, unique(cellRNA)]
seuratRNA <- NormalizeData(object = seuratRNA, verbose = FALSE)
geneDF <- ArchR:::.getFeatureDF(getArrowFiles(ArchRProj), useMatrix)
sumOverlap <- sum(unique(geneDF$name) %in% unique(rownames(seuratRNA)))
if (sumOverlap < 5) {
stop("Error not enough overlaps (", sumOverlap, ") between gene names from gene scores (ArchR) and rna matrix (seRNA)!")
}
blockList <- SimpleList()
for (i in seq_along(groupList)) {
gLi <- groupList[[i]]
if (length(gLi$ATAC) > sampleCellsATAC) {
if (!is.null(embeddingATAC)) {
probATAC <- ArchR:::.getDensity(embeddingATAC[gLi$ATAC, 1], embeddingATAC[gLi$ATAC, 2])$density
probATAC <- probATAC/max(probATAC)
cellsATAC <- gLi$ATAC[order(probATAC, decreasing = TRUE)]
}
else {
cellsATAC <- sample(gLi$ATAC, length(gLi$ATAC))
}
cutoffs <- lapply(seq_len(1000), function(x) length(gLi$ATAC)/x) %>%
unlist
blockSize <- ceiling(min(cutoffs[order(abs(cutoffs - sampleCellsATAC))[1]] + 1, length(gLi$ATAC)))
nBlocks <- ceiling(length(gLi$ATAC)/blockSize)
blocks <- lapply(seq_len(nBlocks), function(x) {
cellsATAC[seq(x, length(cellsATAC), nBlocks)]
}) %>% SimpleList
}
else {
blocks <- list(gLi$ATAC)
}
if (!is.null(embeddingRNA)) {
probRNA <- .getDensity(embeddingRNA[gLi$RNA, 1],
embeddingRNA[gLi$RNA, 2])$density
probRNA <- probRNA/max(probRNA)
}
else {
probRNA <- rep(1, length(gLi$RNA))
}
blockListi <- lapply(seq_along(blocks), function(x) {
SimpleList(ATAC = blocks[[x]], RNA = sample(x = gLi$RNA,
size = min(sampleCellsRNA, length(gLi$RNA)),
prob = probRNA))
}) %>% SimpleList
blockList <- c(blockList, blockListi)
}
rm(groupList)
subProj <- ArchRProj
subProj@imputeWeights <- SimpleList()
geneDF <- ArchR:::.getFeatureDF(getArrowFiles(subProj), useMatrix)
geneDF <- geneDF[geneDF$name %in% rownames(seuratRNA), , drop = FALSE]
splitGeneDF <- S4Vectors::split(geneDF, geneDF$seqnames)
featureDF <- lapply(splitGeneDF, function(x) {
x$idx <- seq_len(nrow(x))
return(x)
}) %>% Reduce("rbind", .)
dfParams <- data.frame(reduction = reduction)
allChr <- unique(featureDF$seqnames)
tmpFile <- ArchR:::.tempfile()
o <- suppressWarnings(file.remove(paste0(tmpFile, "-IntegrationBlock-", seq_along(blockList), ".h5")))
h5lock <- setArchRLocking()
if (h5lock) {
if (threads > 1) {
message("subThreading Disabled since ArchRLocking is TRUE see `addArchRLocking`")
threads <- 1
}
} else {
if (threads > 1) {
message("subThreading Enabled since ArchRLocking is FALSE see `addArchRLocking`")
}
}
rD <- getReducedDims(ArchRProj = ArchRProj, reducedDims = reducedDims,
corCutOff = corCutOff, dimsToUse = dimsToUse)
outDir1 <- getOutputDirectory(ArchRProj)
outDir2 <- file.path(outDir1, "RNAIntegration")
outDir3 <- file.path(outDir2, matrixName)
dir.create(outDir1, showWarnings = FALSE)
dir.create(outDir2, showWarnings = FALSE)
dir.create(outDir3, showWarnings = FALSE)
prevFiles <- list.files(outDir3, full.names = TRUE)
prevFiles <- ArchR:::.suppressAll(file.remove(prevFiles))
threads2 <- max(ceiling(threads * 0.75), 1)
dfAll <- ArchR:::.safelapply(seq_along(blockList), function(i) {
prefix <- sprintf("Block (%s of %s) :", i, length(blockList))
blocki <- blockList[[i]]
subProj@cellColData <- subProj@cellColData[blocki$ATAC, ]
subProj@sampleColData <- subProj@sampleColData[unique(subProj$Sample), , drop = FALSE]
subRNA <- seuratRNA[, blocki$RNA]
subRNA <- subRNA[rownames(subRNA) %in% geneDF$name, ]
subRNA <- FindVariableFeatures(object = subRNA, nfeatures = nGenes, verbose = FALSE)
subRNA <- ScaleData(object = subRNA, verbose = FALSE)
if (is.null(genesUse)) {
genesUse <- VariableFeatures(object = subRNA)
}
mat <- ArchR:::.getPartialMatrix(getArrowFiles(subProj), featureDF = geneDF[geneDF$name %in% genesUse, ],
threads = 2, cellNames = subProj$cellNames, useMatrix = useMatrix, verbose = FALSE)
rownames(mat) <- geneDF[geneDF$name %in% genesUse, "name"]
if (useImputation) {
imputeParams <- list()
imputeParams$ArchRProj <- subProj
imputeParams$randomSuffix <- TRUE
imputeParams$reducedDims <- reducedDims
imputeParams$dimsToUse <- dimsToUse
imputeParams$scaleDims <- scaleDims
imputeParams$corCutOff <- corCutOff
imputeParams$threads <- 1
# imputeParams$logFile <- logFile
subProj <- suppressMessages(do.call(addImputeWeights, imputeParams))
mat <- suppressMessages(imputeMatrix(mat = mat, imputeWeights = getImputeWeights(subProj), verbose = FALSE))
o <- suppressWarnings(file.remove(unlist(getImputeWeights(subProj)[[1]])))
}
mat <- log(mat + 1)
colnames(mat) <- as.character(colnames(mat))
rownames(mat) <- as.character(rownames(mat))
seuratATAC <- Seurat::CreateSeuratObject(counts = mat[head(seq_len(nrow(mat)), 5), , drop = FALSE])
seuratATAC[["GeneScore"]] <- Seurat::CreateAssayObject(counts = mat)
rm(mat)
DefaultAssay(seuratATAC) <- "GeneScore"
seuratATAC <- Seurat::ScaleData(seuratATAC, verbose = FALSE)
transferAnchors <- ArchR:::.retryCatch({
gc()
Seurat::FindTransferAnchors(reference = subRNA, query = seuratATAC, reduction = reduction,
features = genesUse, verbose = FALSE, ...)
}, maxAttempts = 2, logFile = logFile)
rDSub <- rD[colnames(seuratATAC), , drop = FALSE]
transferParams$anchorset <- transferAnchors
transferParams$weight.reduction <- CreateDimReducObject(embeddings = rDSub, key = "LSI_", assay = DefaultAssay(seuratATAC))
transferParams$verbose <- FALSE
transferParams$dims <- seq_len(ncol(rDSub))
transferParams$refdata <- subRNA$Group
rnaLabels <- do.call(Seurat::TransferData, transferParams)
transferParams$refdata <- colnames(subRNA)
rnaLabels2 <- do.call(Seurat::TransferData, transferParams)[, 1]
if (addToArrow) {
transferParams$refdata <- GetAssayData(subRNA, assay = "RNA", slot = "data")
gc()
matchedRNA <- do.call(Seurat::TransferData, transferParams)
matchedRNA <- matchedRNA@data
}
matchDF <- DataFrame(cellNames = colnames(seuratATAC),
predictionScore = rnaLabels$prediction.score.max,
predictedGroup = rnaLabels$predicted.id, predictedCell = rnaLabels2)
rownames(matchDF) <- matchDF$cellNames
jointCCA <- DataFrame([email protected][[1]]@[email protected])
jointCCA$Assay <- ifelse(endsWith(rownames(jointCCA), "_reference"), "RNA", "ATAC")
jointCCA$Group <- NA
jointCCA$Score <- NA
jointCCA[paste0(colnames(subRNA), "_reference"), "Group"] <- subRNA$Group
jointCCA[paste0(matchDF$cellNames, "_query"), "Group"] <- matchDF$predictedGroup
jointCCA[paste0(matchDF$cellNames, "_query"), "Score"] <- matchDF$predictionScore
ArchR:::.safeSaveRDS(object = jointCCA, file = file.path(outDir3, paste0("Save-Block", i, "-JointCCA.rds")))
rm(transferParams, transferAnchors)
gc()
if (addToArrow) {
tmpFilei <- paste0(tmpFile, "-IntegrationBlock-", i, ".h5")
o <- h5createFile(tmpFilei)
sampleNames <- getCellColData(subProj, "Sample")[matchDF$cellNames, ]
uniqueSamples <- unique(sampleNames)
matchedRNA <- ArchR:::.safeSubset(mat = matchedRNA, subsetRows = paste0(featureDF$name), subsetCols = matchDF$cellNames)
for (z in seq_along(uniqueSamples)) {
mat <- matchedRNA[, which(sampleNames == uniqueSamples[z]), drop = FALSE]
Group <- uniqueSamples[z]
o <- tryCatch({
h5delete(tmpFilei, paste0(Group))
}, error = function(x) {
})
o <- h5createGroup(tmpFilei, paste0(Group))
j <- Rle(findInterval(seq(mat@x) - 1, mat@p[-1]) + 1)
lengthRle <- length(j@lengths)
lengthI <- length(mat@i)
o <- ArchR:::.suppressAll(h5createDataset(tmpFilei,
paste0(Group, "/i"), storage.mode = "integer",
dims = c(lengthI, 1), level = getArchRH5Level()))
o <- ArchR:::.suppressAll(h5createDataset(tmpFilei,
paste0(Group, "/jLengths"), storage.mode = "integer",
dims = c(lengthRle, 1), level = getArchRH5Level()))
o <- ArchR:::.suppressAll(h5createDataset(tmpFilei,
paste0(Group, "/jValues"), storage.mode = "integer",
dims = c(lengthRle, 1), level = getArchRH5Level()))
o <- ArchR:::.suppressAll(h5createDataset(tmpFilei,
paste0(Group, "/x"), storage.mode = "double",
dims = c(lengthI, 1), level = getArchRH5Level()))
o <- ArchR:::.suppressAll(h5write(obj = mat@i + 1, file = tmpFilei,
name = paste0(Group, "/i")))
o <- ArchR:::.suppressAll(h5write(obj = j@lengths, file = tmpFilei,
name = paste0(Group, "/jLengths")))
o <- ArchR:::.suppressAll(h5write(obj = j@values, file = tmpFilei,
name = paste0(Group, "/jValues")))
o <- ArchR:::.suppressAll(h5write(obj = mat@x, file = tmpFilei,
name = paste0(Group, "/x")))
o <- ArchR:::.suppressAll(h5write(obj = colnames(mat),
file = tmpFilei, name = paste0(Group, "/cellNames")))
}
rm(matchedRNA, mat, j)
}
gc()
matchDF$Block <- Rle(i)
matchDF
}, threads = threads2) %>% Reduce("rbind", .)
if (plotUMAP) {
for (i in seq_along(blockList)) {
o <- tryCatch({
prefix <- sprintf("Block (%s of %s) :", i, length(blockList))
jointCCA <- readRDS(file.path(outDir3, paste0("Save-Block", i, "-JointCCA.rds")))
set.seed(1)
UMAPParams <- .mergeParams(UMAPParams, list(n_neighbors = 40, min_dist = 0.4, metric = "cosine", verbose = FALSE))
UMAPParams$X <- as.data.frame(jointCCA[, grep("CC_", colnames(jointCCA))])
UMAPParams$ret_nn <- FALSE
UMAPParams$ret_model <- FALSE
UMAPParams$n_threads <- 1
uwotUmap <- tryCatch({
do.call(uwot::umap, UMAPParams)
}, error = function(e) {
errorList <- UMAPParams
})
jointCCA$UMAP1 <- uwotUmap[, 1]
jointCCA$UMAP2 <- uwotUmap[, 2]
.safeSaveRDS(object = jointCCA, file = file.path(outDir3, paste0("Save-Block", i, "-JointCCA.rds")))
p1 <- ggPoint(x = uwotUmap[, 1], y = uwotUmap[, 2], color = jointCCA$Assay, randomize = TRUE,
size = 0.2, title = paste0(prefix, " colored by Assay"),
xlabel = "UMAP Dimension 1", ylabel = "UMAP Dimension 2",
rastr = TRUE) + theme(axis.text.x = element_blank(),
axis.ticks.x = element_blank(), axis.text.y = element_blank(),
axis.ticks.y = element_blank())
p2 <- ggPoint(x = uwotUmap[, 1], y = uwotUmap[, 2], color = jointCCA$Group, randomize = TRUE,
size = 0.2, title = paste0(prefix, " colored by scRNA Group"),
xlabel = "UMAP Dimension 1", ylabel = "UMAP Dimension 2",
rastr = TRUE) + theme(axis.text.x = element_blank(),
axis.ticks.x = element_blank(), axis.text.y = element_blank(),
axis.ticks.y = element_blank())
pdf(file.path(outDir3, paste0("Save-Block", i, "-JointCCA-UMAP.pdf")), width = 12, height = 6,
useDingbats = FALSE)
ggAlignPlots(p1, p2, type = "h")
dev.off()
}, error = function(e) {
})
}
}
if (addToArrow) {
matrixName <- ArchR:::.isProtectedArray(matrixName)
integrationFiles <- paste0(tmpFile, "-IntegrationBlock-", seq_along(blockList), ".h5")
if (!all(file.exists(integrationFiles))) {
stop("Something went wrong with integration as not all temporary files containing integrated RNA exist!")
}
h5list <- ArchR:::.safelapply(seq_along(integrationFiles), function(x) {
h5ls(integrationFiles[x])
}, threads = threads)
ArrowFiles <- getArrowFiles(ArchRProj)
allSamples <- names(ArrowFiles)
o <- .safelapply(seq_along(allSamples), function(y) {
sample <- allSamples[y]
prefix <- sprintf("%s (%s of %s)", sample, y, length(ArrowFiles))
sampleIF <- lapply(seq_along(h5list), function(x) {
if (any(h5list[[x]]$group == paste0("/", sample))) {
integrationFiles[x]
}
else {
NULL
}
}) %>% unlist
sampleMat <- lapply(seq_along(sampleIF), function(x) {
cellNames <- ArchR:::.h5read(sampleIF[x], paste0(sample, "/cellNames"))
mat <- sparseMatrix(i = ArchR:::.h5read(sampleIF[x],
paste0(sample, "/i"))[, 1], j = as.vector(Rle(ArchR:::.h5read(sampleIF[x],
paste0(sample, "/jValues"))[, 1], ArchR:::.h5read(sampleIF[x],
paste0(sample, "/jLengths"))[, 1])), x = ArchR:::.h5read(sampleIF[x],
paste0(sample, "/x"))[, 1], dims = c(nrow(featureDF),
length(cellNames)))
colnames(mat) <- cellNames
mat
}) %>% Reduce("cbind", .)
sampleMat@x <- exp(sampleMat@x) - 1
sampleMat <- ArchR:::.normalizeCols(sampleMat, scaleTo = scaleTo)
sampleMat <- drop0(sampleMat)
rownames(sampleMat) <- paste0(featureDF$name)
sampleMat <- sampleMat[, ArchRProj$cellNames[BiocGenerics::which(ArchRProj$Sample == sample)], drop = FALSE]
o <- ArchR:::.createArrowGroup(ArrowFile = ArrowFiles[sample],
group = matrixName, force = force)
o <- ArchR:::.initializeMat(ArrowFile = ArrowFiles[sample],
Group = matrixName, Class = "double", Units = "NormCounts",
cellNames = colnames(sampleMat), params = dfParams,
featureDF = featureDF, force = force)
o <- h5write(obj = dfAll[colnames(sampleMat), "predictionScore"],
file = ArrowFiles[sample], name = paste0(matrixName,
"/Info/predictionScore"))
o <- h5write(obj = dfAll[colnames(sampleMat), "predictedGroup"],
file = ArrowFiles[sample], name = paste0(matrixName,
"/Info/predictedGroup"))
o <- h5write(obj = dfAll[colnames(sampleMat), "predictedCell"],
file = ArrowFiles[sample], name = paste0(matrixName,
"/Info/predictedCell"))
for (z in seq_along(allChr)) {
chrz <- allChr[z]
idz <- BiocGenerics::which(featureDF$seqnames %bcin% chrz)
matz <- sampleMat[idz, , drop = FALSE]
stopifnot(identical(paste0(featureDF$name[idz]),
paste0(rownames(matz))))
o <- ArchR:::.addMatToArrow(mat = matz, ArrowFile = ArrowFiles[sample],
Group = paste0(matrixName, "/", chrz), binarize = FALSE,
addColSums = TRUE, addRowSums = TRUE, addRowVarsLog2 = TRUE,
logFile = logFile)
rm(matz)
if (z%%3 == 0 | z == length(allChr)) {
gc()
}
}
0
}, threads = threads)
o <- suppressWarnings(file.remove(integrationFiles))
}
ArchRProj <- addCellColData(ArchRProj = ArchRProj, cells = dfAll$cellNames,
data = dfAll$predictedCell, name = nameCell, force = TRUE)
ArchRProj <- addCellColData(ArchRProj = ArchRProj, cells = dfAll$cellNames,
data = dfAll$predictedGroup, name = nameGroup, force = TRUE)
ArchRProj <- addCellColData(ArchRProj = ArchRProj, cells = dfAll$cellNames,
data = dfAll$predictionScore, name = nameScore, force = TRUE)
return(ArchRProj)
}
I encountered this problem too. ArchR 1.0.2
` > sessionInfo()
R version 4.3.2 (2023-10-31) Platform: x86_64-conda-linux-gnu (64-bit) Running under: Ubuntu 20.04.5 LTS
Matrix products: default BLAS/LAPACK: /root/miniconda3/envs/seurat/lib/libopenblasp-r0.3.25.so; LAPACK version 3.11.0
Random number generation: RNG: L'Ecuyer-CMRG Normal: Inversion Sample: Rejection
locale: [1] C
time zone: Etc/UTC tzcode source: system (glibc)
attached base packages: [1] parallel stats4 grid stats graphics grDevices utils [8] datasets methods base
other attached packages: [1] Seurat_5.0.1 SeuratObject_5.0.1 [3] sp_2.1-2 rhdf5_2.44.0 [5] SummarizedExperiment_1.30.2 Biobase_2.60.0 [7] MatrixGenerics_1.12.2 Rcpp_1.0.11 [9] Matrix_1.6-4 GenomicRanges_1.52.0 [11] GenomeInfoDb_1.36.1 IRanges_2.34.1 [13] S4Vectors_0.38.1 BiocGenerics_0.46.0 [15] matrixStats_1.0.0 data.table_1.14.8 [17] stringr_1.5.1 plyr_1.8.9 [19] magrittr_2.0.3 ggplot2_3.4.4 [21] gtable_0.3.4 gtools_3.9.5 [23] gridExtra_2.3 ArchR_1.0.2
loaded via a namespace (and not attached): [1] RColorBrewer_1.1-3 jsonlite_1.8.7 spatstat.utils_3.0-4 [4] zlibbioc_1.46.0 vctrs_0.6.5 ROCR_1.0-11 [7] Cairo_1.6-2 spatstat.explore_3.2-5 RCurl_1.98-1.12 [10] htmltools_0.5.7 S4Arrays_1.2.0 curl_5.1.0 [13] Rhdf5lib_1.22.0 SparseArray_1.2.2 sctransform_0.4.1 [16] parallelly_1.36.0 KernSmooth_2.23-22 desc_1.4.2 [19] htmlwidgets_1.6.3 ica_1.0-3 plotly_4.10.3 [22] zoo_1.8-12 igraph_1.5.1 mime_0.12 [25] lifecycle_1.0.4 pkgconfig_2.0.3 R6_2.5.1 [28] fastmap_1.1.1 GenomeInfoDbData_1.2.11 fitdistrplus_1.1-11 [31] future_1.33.0 shiny_1.8.0 digest_0.6.33 [34] colorspace_2.1-0 ps_1.7.5 patchwork_1.1.3 [37] rprojroot_2.0.4 tensor_1.5 RSpectra_0.16-1 [40] irlba_2.3.5.1 progressr_0.14.0 fansi_1.0.5 [43] spatstat.sparse_3.0-3 httr_1.4.7 polyclip_1.10-6 [46] abind_1.4-5 compiler_4.3.2 remotes_2.4.2.1 [49] withr_2.5.2 fastDummies_1.7.3 pkgbuild_1.4.2 [52] R.utils_2.12.3 MASS_7.3-60 DelayedArray_0.28.0 [55] tools_4.3.2 lmtest_0.9-40 httpuv_1.6.12 [58] future.apply_1.11.0 goftest_1.2-3 R.oo_1.25.0 [61] glue_1.6.2 callr_3.7.3 nlme_3.1-164 [64] rhdf5filters_1.12.1 promises_1.2.1 Rtsne_0.16 [67] cluster_2.1.6 reshape2_1.4.4 generics_0.1.3 [70] spatstat.data_3.0-3 R.methodsS3_1.8.2 tidyr_1.3.0 [73] utf8_1.2.4 XVector_0.40.0 spatstat.geom_3.2-7 [76] RcppAnnoy_0.0.21 ggrepel_0.9.4 RANN_2.6.1 [79] pillar_1.9.0 spam_2.10-0 RcppHNSW_0.5.0 [82] later_1.3.1 splines_4.3.2 dplyr_1.1.4 [85] lattice_0.22-5 survival_3.5-7 deldir_2.0-2 [88] tidyselect_1.2.0 miniUI_0.1.1.1 pbapply_1.7-2 [91] scattermore_1.2 stringi_1.8.2 lazyeval_0.2.2 [94] codetools_0.2-19 tibble_3.2.1 BiocManager_1.30.22 [97] cli_3.6.1 uwot_0.1.16 xtable_1.8-4 [100] reticulate_1.34.0 processx_3.8.2 munsell_0.5.0 [103] spatstat.random_3.2-2 globals_0.16.2 png_0.1-8 [106] ellipsis_0.3.2 prettyunits_1.2.0 dotCall64_1.1-1 [109] bitops_1.0-7 listenv_0.9.0 viridisLite_0.4.2 [112] scales_1.3.0 ggridges_0.5.4 leiden_0.4.3.1 [115] purrr_1.0.2 crayon_1.5.2 rlang_1.1.2 [118] cowplot_1.1.1 `
Maybe it's the Seurat v5 causing this problem. I've encontered same problem with Seurat 5.0.1. I first convert my RNA assay from v5 to v3(or v4?)
seRNA[["RNA4"]] <- as(object = seRNA[["RNA"]], Class = "Assay")
seRNA[["RNA"]] <- seRNA[["RNA4"]]
save the seurat object, and reload using Seurat 4.3.0 and ArchR, then addGeneIntegrationMatrix() goes well.
Maybe re-analyze your seRNA from scratch would help as well......