ArchR icon indicating copy to clipboard operation
ArchR copied to clipboard

Error in addGeneIntegrationMatrix / abnormally high number of communities

Open Khreat0205 opened this issue 11 months ago • 12 comments

When executing the code provided, the following error occurs:

> mRCCv5
An object of class Seurat 
22921 features across 102723 samples within 1 assay 
Active assay: RNA (22921 features, 2000 variable features)
 2 layers present: counts, data
> proj

           ___      .______        ______  __    __  .______      
          /   \     |   _  \      /      ||  |  |  | |   _  \     
         /  ^  \    |  |_)  |    |  ,----'|  |__|  | |  |_)  |    
        /  /_\  \   |      /     |  |     |   __   | |      /     
       /  _____  \  |  |\  \\___ |  `----.|  |  |  | |  |\  \\___.
      /__/     \__\ | _| `._____| \______||__|  |__| | _| `._____|
    
class: ArchRProject 
outputDirectory: atac_RCC100 
samples(1): RCC100_ver2
sampleColData names(1): ArrowFiles
cellColData names(16): Sample TSSEnrichment ... BlacklistRatio
  Clusters
numberOfCells(1): 44844
medianTSS(1): 6.249
medianFrags(1): 3221

> proj <- addGeneIntegrationMatrix(
+   ArchRProj = proj, 
+   useMatrix = "GeneScoreMatrix",
+   matrixName = "GeneIntegrationMatrix",
+   reducedDims = "IterativeLSI",
+   seRNA = mRCCv5,
+   sampleCellsRNA = 5000,
+   addToArrow = F,
+   groupRNA = "BioClassification",
+   nameCell = "predictedCell_Un",
+   nameGroup = "predictedGroup_Un",
+   nameScore = "predictedScore_Un",
+   force = T
+ )

The single-cell ATAC-seq dataset consists of multiple samples, but the above function was executed only on a single sample, and even when performed on an ArchRProject that combines multiple samples, the same error occurs. I have also tried switching to different Seurat assay versions such as v3, v5, but the same error persists.

ArchR logging to : ArchRLogs/ArchR-addGeneIntegrationMatrix-104ce73bf6ad1-Date-2023-07-28_Time-01-19-53.700155.log
If there is an issue, please report to github with logFile!
2023-07-28 01:19:54.010202 : Running Seurat's Integration Stuart* et al 2019, 0.005 mins elapsed.
2023-07-28 01:19:54.041649 : Checking ATAC Input, 0.006 mins elapsed.
2023-07-28 01:19:59.577644 : Checking RNA Input, 0.098 mins elapsed.
Warning: The following arguments are not used: layer
2023-07-28 01:20:16.147696 : Found 17989 overlapping gene names from gene scores and rna matrix!, 0.374 mins elapsed.
2023-07-28 01:20:16.14888 : Creating Integration Blocks, 0.374 mins elapsed.
2023-07-28 01:20:18.104443 : Prepping Interation Data, 0.407 mins elapsed.
2023-07-28 01:20:18.623075 : Computing Integration in 5 Integration Blocks!, 0 mins elapsed.
Error in .safelapply(seq_along(blockList), function(i) { : 
Error Found Iteration 1 : 
	[1] "Error in slot(object = object, name = \"features\")[[layer]] <- features : \n  more elements supplied than there are to replace\n"
	<simpleError in slot(object = object, name = "features")[[layer]] <- features: more elements supplied than there are to replace>
Error Found Iteration 2 : 
	[1] "Error in slot(object = object, name = \"features\")[[layer]] <- features : \n  more elements supplied than there are to replace\n"
	<simpleError in slot(object = object, name = "features")[[layer]] <- features: more elements supplied than there are to replace>
Error Found Iteration 3 : 
	[1] "Error in slot(object = object, name = \"features\")[[layer]] <- features : \n  more elements supplied than there are to replace\n"
	<simpleError in slot(object = object, name = "features")[[layer]] <- features: more elements supplied than there are to replace>
Error Found Iteration 4 : 
	[1] "Error in slot(object = object, name = \"features\")[[layer]] <- features : 
In addition: Warning message:
In mclapply(..., mc.cores = threads, mc.preschedule = preschedule) :
  5 function calls resulted in an error

Below is the result of executing sessioninfo():

R version 4.3.0 (2023-04-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 22.04.2 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so;  LAPACK version 3.10.0

Random number generation:
 RNG:     L'Ecuyer-CMRG 
 Normal:  Inversion 
 Sample:  Rejection 
 
locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8       
 [4] LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C              
[10] LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

time zone: Etc/UTC
tzcode source: system (glibc)

attached base packages:
 [1] parallel  stats4    grid      stats     graphics  grDevices utils     datasets  methods  
[10] base     

other attached packages:
 [1] hexbin_1.28.3                     nabor_0.5.0                      
 [3] uwot_0.1.16                       BSgenome.Hsapiens.UCSC.hg38_1.4.5
 [5] BSgenome_1.68.0                   rtracklayer_1.60.0               
 [7] Biostrings_2.68.1                 XVector_0.40.0                   
 [9] Seurat_4.9.9.9049                 SeuratObject_4.9.9.9085          
[11] sp_1.6-1                          trqwe_0.1                        
[13] rhdf5_2.44.0                      SummarizedExperiment_1.30.1      
[15] Biobase_2.60.0                    MatrixGenerics_1.12.0            
[17] Rcpp_1.0.11                       Matrix_1.6-0                     
[19] GenomicRanges_1.52.0              GenomeInfoDb_1.36.0              
[21] IRanges_2.34.0                    S4Vectors_0.38.1                 
[23] BiocGenerics_0.46.0               matrixStats_1.0.0                
[25] data.table_1.14.8                 stringr_1.5.0                    
[27] plyr_1.8.8                        magrittr_2.0.3                   
[29] ggplot2_3.4.2                     gtable_0.3.3                     
[31] gtools_3.9.4                      gridExtra_2.3                    
[33] ArchR_1.0.2                      

loaded via a namespace (and not attached):
  [1] RColorBrewer_1.1-3       rstudioapi_0.15.0        jsonlite_1.8.5          
  [4] spatstat.utils_3.0-3     farver_2.1.1             BiocIO_1.10.0           
  [7] zlibbioc_1.46.0          vctrs_0.6.3              ROCR_1.0-11             
 [10] Rsamtools_2.16.0         Cairo_1.6-0              spatstat.explore_3.2-1  
 [13] RCurl_1.98-1.12          htmltools_0.5.5          S4Arrays_1.0.4          
 [16] Rhdf5lib_1.22.0          sctransform_0.3.5        parallelly_1.36.0       
 [19] KernSmooth_2.23-22       htmlwidgets_1.6.2        ica_1.0-3               
 [22] plotly_4.10.2            zoo_1.8-12               GenomicAlignments_1.36.0
 [25] igraph_1.5.0             mime_0.12                lifecycle_1.0.3         
 [28] pkgconfig_2.0.3          R6_2.5.1                 fastmap_1.1.1           
 [31] GenomeInfoDbData_1.2.10  fitdistrplus_1.1-11      future_1.33.0           
 [34] shiny_1.7.4.1            digest_0.6.33            colorspace_2.1-0        
 [37] patchwork_1.1.2          tensor_1.5               RSpectra_0.16-1         
 [40] irlba_2.3.5.1            labeling_0.4.2           progressr_0.13.0        
 [43] fansi_1.0.4              spatstat.sparse_3.0-2    polyclip_1.10-4         
 [46] httr_1.4.6               abind_1.4-5              compiler_4.3.0          
 [49] withr_2.5.0              BiocParallel_1.34.1      fastDummies_1.6.3       
 [52] MASS_7.3-60              DelayedArray_0.26.2      rjson_0.2.21            
 [55] tools_4.3.0              lmtest_0.9-40            httpuv_1.6.11           
 [58] future.apply_1.11.0      goftest_1.2-3            glue_1.6.2              
 [61] restfulr_0.0.15          nlme_3.1-162             rhdf5filters_1.12.1     
 [64] promises_1.2.0.1         Rtsne_0.16               cluster_2.1.4           
 [67] reshape2_1.4.4           generics_0.1.3           spatstat.data_3.0-1     
 [70] tidyr_1.3.0              utf8_1.2.3               spatstat.geom_3.2-4     
 [73] RcppAnnoy_0.0.21         ggrepel_0.9.3            RANN_2.6.1              
 [76] pillar_1.9.0             spam_2.9-1               RcppHNSW_0.4.1          
 [79] later_1.3.1              splines_4.3.0            dplyr_1.1.2             
 [82] lattice_0.21-8           deldir_1.0-9             survival_3.5-5          
 [85] tidyselect_1.2.0         miniUI_0.1.1.1           pbapply_1.7-2           
 [88] scattermore_1.2          stringi_1.7.12           yaml_2.3.7              
 [91] lazyeval_0.2.2           codetools_0.2-19         tibble_3.2.1            
 [94] cli_3.6.1                xtable_1.8-4             reticulate_1.30         
 [97] munsell_0.5.0            spatstat.random_3.1-5    globals_0.16.2          
[100] png_0.1-8                XML_3.99-0.14            ellipsis_0.3.2          
[103] dotCall64_1.0-2          bitops_1.0-7             listenv_0.9.0           
[106] viridisLite_0.4.2        scales_1.2.1             ggridges_0.5.4          
[109] leiden_0.4.3             purrr_1.0.1              crayon_1.5.2            
[112] rlang_1.1.1              cowplot_1.1.1         

During the execution of doubletScoring, the following message was displayed, and when performing clustering, the number of communities was abnormally high. For the given sample, 130 communities were identified, and 117 singletons were merged into one cluster. In other datasets with 19 samples, the number of communities showed 10365, and even after 13 hours, the addCluster function is still running.

 Correlation of UMAP Projection is below 0.9 (normally this is ~0.99)
This means there is little heterogeneity in your sample, and thus doubletCalling is inaccurate. force = FALSE, thus returning -1 doubletScores and doubletEnrichments!

The purpose of this issue is to resolve the error in 'addGeneIntegrationMatrix'. Additionally, I would appreciate it if you could suggest appropriate options related to the 'addDoubletScores' function. Thank you.

Khreat0205 avatar Jul 28 '23 01:07 Khreat0205

Hi @Khreat0205! Thanks for using ArchR! Please make sure that your post belongs in the Issues section. Only bugs and error reports belong in the Issues section. Usage questions and feature requests should be posted in the Discussions section, not in Issues.
It is worth noting that there are very few actual bugs in ArchR. If you are getting an error, it is probably something specific to your dataset, usage, or computational environment, all of which are extremely challenging to troubleshoot. As such, we require reproducible examples (preferably using the tutorial dataset) from users who want assistance. If you cannot reproduce your error, we will not be able to help. Before going through the work of making a reproducible example, search the previous Issues, Discussions, function definitions, or the ArchR manual and you will likely find the answers you are looking for. If your post does not contain a reproducible example, it is unlikely to receive a response.
__In addition to a reproducible example, you must do the following things before we help you, unless your original post already contained this information: 1. If you've encountered an error, have you already searched previous Issues to make sure that this hasn't already been solved? 2. Did you post your log file? If not, add it now. 3. Remove any screenshots that contain text and instead copy and paste the text using markdown's codeblock syntax (three consecutive backticks). You can do this by editing your original post.

rcorces avatar Jul 28 '23 01:07 rcorces

I am encountering an error when using the addGeneIntegrationMatrix function. Here is the error message I am getting:

Error in .safelapply(seq_along(blockList), function(i) { : 
Error Found Iteration 1 : 
	[1] "Error in slot(object = object, name = \"features\")[[layer]] <- features : \n  more elements supplied than there are to replace\n"
	<simpleError in slot(object = object, name = "features")[[layer]] <- features: more elements supplied than there are to replace>
Error Found Iteration 2 : 
	[1] "Error in slot(object = object, name = \"features\")[[layer]] <- features : \n  more elements supplied than there are to replace\n"
	<simpleError in slot(object = object, name = "features")[[layer]] <- features: more elements supplied than there are to replace>
Error Found Iteration 3 : 
	[1] "Error in slot(object = object, name = \"features\")[[layer]] <- features : \n  more elements supplied than there are to replace\n"
	<simpleError in slot(object = object, name = "features")[[layer]] <- features: more elements supplied than there are to replace>
Error Found Iteration 4 : 
	[1] "Error in slot(object = object, name = \"features\")[[layer]] <- features : 
In addition: Warning message:

To help you understand the full context of what I was doing when this error occurred, I have attached the log file from that time. Please note that I've looked through all previous issues related to this but couldn't find anything that might help solve my problem.

If you need more information about this error, please let me know. I am eager to resolve this issue as soon as possible.

Thank you.

ArchR.log

Khreat0205 avatar Jul 28 '23 01:07 Khreat0205

any update on this issue?

e-arslan avatar Sep 19 '23 19:09 e-arslan

I recently encountered a similar message:

Error in slot(object = object, name = "cells")[[layer]] <- cells : 
  more elements supplied than there are to replace

The only difference is that I have problem with cell names but not feature names. I also found that I only have this problem in the last integration block, but not the preceding ones. So to figure out why, I decided to run script pf addGeneIntegrationMatrix line by line. After a lot of digging around, I found that the reason for my error message is that the column names of this matrix is an array!!

mat <- ArchR:::.getPartialMatrix(getArrowFiles(subProj), featureDF = geneDF[geneDF$name %in% genesUse, ],
                                                       threads = 2, cellNames = subProj$cellNames, useMatrix = useMatrix, verbose = FALSE)

and

mat <- suppressMessages(imputeMatrix(mat = mat, imputeWeights = getImputeWeights(subProj), verbose = FALSE))
> typeof(colnames(mat[1:3,1:3]))
[1] "character"
> class(colnames(mat[1:3,1:3]))
[1] "array"

Converting them to character colnames(mat) <- as.character(colnames(mat)) solves the problem. But there's still two things that are strange:

  • This only happened in my most recent analysis. Previous analysis are totally fine. I do not recall updating anything recently.
  • This only happened to the cells in the last integration, no matter how many blocks there are. When I checked the mat from the first block:
> class(colnames(mat))
[1] "character"
> typeof(colnames(mat))
[1] "character"

I have no clue why these two things happen. But hopefully this is helpful. ArchR version: ArchR * 1.0.3 2023-02-18 [1] Github (GreenleafLab/ArchR@55ab73e)

Zepeng-Mu avatar Oct 13 '23 18:10 Zepeng-Mu

it also seems that in this case, "character" colnames were converted to "array" in imputeMatrix() step.

Zepeng-Mu avatar Oct 14 '23 16:10 Zepeng-Mu

First command isn’t working in my case

uqnsarke avatar Oct 29 '23 10:10 uqnsarke

which one?

Zepeng-Mu avatar Oct 29 '23 15:10 Zepeng-Mu

mat <- ArchR:::.getPartialMatrix(getArrowFiles(subProj), featureDF = geneDF[geneDF$name %in% genesUse, ], threads = 2, cellNames = subProj$cellNames, useMatrix = useMatrix, verbose = FALSE)

Would you mind to share your full code if possible? I appreciate your willingness to help. Thanks

uqnsarke avatar Oct 29 '23 16:10 uqnsarke

I see. I think you might see the error because this is extracted from the function, so you cannot just run it independently without getting the intermediate inputs. Here is the full function. Note two things though: (1) this function is just the addGeneIntegration function, I only added line 225, 226 in this function; (2) this function does not generate a log file, because logging functions in ArchR are not exported so it's a bit difficult to work with outside of ArchR; you will have to wait until the whole function finishes to see if you get the results in your ArchR project.

testIntegration <- function (ArchRProj = NULL, useMatrix = "GeneScoreMatrix", matrixName = "GeneIntegrationMatrix",
                             reducedDims = "IterativeLSI", seRNA = NULL, groupATAC = NULL,
                             groupRNA = NULL, groupList = NULL, sampleCellsATAC = 10000,
                             sampleCellsRNA = 10000, embeddingATAC = NULL, embeddingRNA = NULL,
                             dimsToUse = 1:30, scaleDims = NULL, corCutOff = 0.75, plotUMAP = TRUE,
                             UMAPParams = list(n_neighbors = 40, min_dist = 0.4, metric = "cosine",
                                               verbose = FALSE), nGenes = 2000, useImputation = TRUE,
                             reduction = "cca", addToArrow = TRUE, scaleTo = 10000, genesUse = NULL,
                             nameCell = "predictedCell", nameGroup = "predictedGroup",
                             nameScore = "predictedScore", transferParams = list(), threads = getArchRThreads(),
                             verbose = TRUE, force = FALSE, logFile = createLogFile("addGeneIntegrationMatrix"),
                             ...)
{
  if (is.null(groupList)) {
    groupList <- SimpleList()
    groupList[[1]] <- SimpleList(ATAC = ArchRProj$cellNames, RNA = colnames(seRNA))
  }

  if (useMatrix %ni% getAvailableMatrices(ArchRProj)) {
    stop("Matrix name provided to useMatrix does not exist in ArchRProject!")
  }

  if (!is.null(groupATAC)) {
    dfATAC <- getCellColData(ArchRProj = ArchRProj, select = groupATAC, drop = FALSE)
  }

  nCell <- rep(0, length(ArchRProj$cellNames))
  names(nCell) <- ArchRProj$cellNames
  groupList <- lapply(seq_along(groupList), function(x) {
    ATAC <- groupList[[x]]$ATAC
    if (!is.null(groupATAC)) {
      if (any(ATAC %in% dfATAC[, 1])) {
        idx <- which(ATAC %in% dfATAC[, 1])
        ATAC2 <- rownames(dfATAC)[which(dfATAC[, 1] %in% ATAC[idx])]
        if (length(idx) == length(ATAC)) {
          ATAC <- ATAC2
        }
        else {
          ATAC <- c(ATAC[-idx], ATAC2)
        }
      }
    }
    SimpleList(ATAC = ATAC, RNA = groupList[[x]]$RNA)
  }) %>% SimpleList

  for (i in seq_along(groupList)) {
    nCell[groupList[[i]]$ATAC] <- nCell[groupList[[i]]$ATAC] + 1
  }

  if (!all(nCell == 1)) {
    stop("Missing ", length(which(nCell == 0)), " cells. Found ",
         length(which(nCell > 1)), " overlapping cells from ArchRProj in groupList! Cannot have overlapping/missing cells in ATAC input, check 'groupList' argument!")
  }

  if (inherits(seRNA, "SummarizedExperiment")) {
    seuratRNA <- CreateSeuratObject(counts = assay(seRNA))
    if (groupRNA %ni% colnames(colData(seRNA))) {
      stop("groupRNA not in colData of seRNA")
    }
    seuratRNA$Group <- paste0(colData(seRNA)[, groupRNA, drop = TRUE])
    rm(seRNA)
  } else {
    if (groupRNA %ni% colnames([email protected])) {
      stop("groupRNA not in meta.data of Seurat Object")
    }
    seuratRNA <- seRNA
    seuratRNA$Group <- paste0([email protected][, groupRNA])
    rm(seRNA)
  }

  if ("RNA" %in% names(seuratRNA@assays)) {
    DefaultAssay(seuratRNA) <- "RNA"
  } else {
    stop("'RNA' is not present in Seurat Object's Assays! Please make sure that this assay is present!")
  }

  gc()
  if (!is.null(groupRNA)) {
    dfRNA <- DataFrame(row.names = colnames(seuratRNA), Group = seuratRNA$Group)
  }

  groupList <- lapply(seq_along(groupList), function(x) {
    RNA <- groupList[[x]]$RNA
    if (!is.null(groupRNA)) {
      if (any(RNA %in% dfRNA[, 1])) {
        idx <- which(RNA %in% dfRNA[, 1])
        RNA2 <- rownames(dfRNA)[which(dfRNA[, 1] %in% RNA[idx])]
        if (length(idx) == length(RNA)) {
          RNA <- RNA2
        }
        else {
          RNA <- c(RNA[-idx], RNA2)
        }
      }
    }
    SimpleList(ATAC = groupList[[x]]$ATAC, RNA = RNA)
  }) %>% SimpleList

  cellRNA <- unlist(lapply(groupList, function(x) x$RNA))
  if (!all(cellRNA %in% colnames(seuratRNA))) {
    stop("Found cells for RNA not in colnames(seRNA)! Please retry your input!")
  }

  seuratRNA <- seuratRNA[, unique(cellRNA)]
  seuratRNA <- NormalizeData(object = seuratRNA, verbose = FALSE)
  geneDF <- ArchR:::.getFeatureDF(getArrowFiles(ArchRProj), useMatrix)
  sumOverlap <- sum(unique(geneDF$name) %in% unique(rownames(seuratRNA)))
  if (sumOverlap < 5) {
    stop("Error not enough overlaps (", sumOverlap, ") between gene names from gene scores (ArchR) and rna matrix (seRNA)!")
  }

  blockList <- SimpleList()
  for (i in seq_along(groupList)) {
    gLi <- groupList[[i]]
    if (length(gLi$ATAC) > sampleCellsATAC) {
      if (!is.null(embeddingATAC)) {
        probATAC <- ArchR:::.getDensity(embeddingATAC[gLi$ATAC, 1], embeddingATAC[gLi$ATAC, 2])$density
        probATAC <- probATAC/max(probATAC)
        cellsATAC <- gLi$ATAC[order(probATAC, decreasing = TRUE)]
      }
      else {
        cellsATAC <- sample(gLi$ATAC, length(gLi$ATAC))
      }
      cutoffs <- lapply(seq_len(1000), function(x) length(gLi$ATAC)/x) %>%
        unlist
      blockSize <- ceiling(min(cutoffs[order(abs(cutoffs - sampleCellsATAC))[1]] + 1, length(gLi$ATAC)))
      nBlocks <- ceiling(length(gLi$ATAC)/blockSize)
      blocks <- lapply(seq_len(nBlocks), function(x) {
        cellsATAC[seq(x, length(cellsATAC), nBlocks)]
      }) %>% SimpleList
    }
    else {
      blocks <- list(gLi$ATAC)
    }
    if (!is.null(embeddingRNA)) {
      probRNA <- .getDensity(embeddingRNA[gLi$RNA, 1],
                             embeddingRNA[gLi$RNA, 2])$density
      probRNA <- probRNA/max(probRNA)
    }
    else {
      probRNA <- rep(1, length(gLi$RNA))
    }
    blockListi <- lapply(seq_along(blocks), function(x) {
      SimpleList(ATAC = blocks[[x]], RNA = sample(x = gLi$RNA,
                                                  size = min(sampleCellsRNA, length(gLi$RNA)),
                                                  prob = probRNA))
    }) %>% SimpleList
    blockList <- c(blockList, blockListi)
  }

  rm(groupList)
  subProj <- ArchRProj
  subProj@imputeWeights <- SimpleList()
  geneDF <- ArchR:::.getFeatureDF(getArrowFiles(subProj), useMatrix)
  geneDF <- geneDF[geneDF$name %in% rownames(seuratRNA), , drop = FALSE]
  splitGeneDF <- S4Vectors::split(geneDF, geneDF$seqnames)
  featureDF <- lapply(splitGeneDF, function(x) {
    x$idx <- seq_len(nrow(x))
    return(x)
  }) %>% Reduce("rbind", .)

  dfParams <- data.frame(reduction = reduction)
  allChr <- unique(featureDF$seqnames)
  tmpFile <- ArchR:::.tempfile()
  o <- suppressWarnings(file.remove(paste0(tmpFile, "-IntegrationBlock-", seq_along(blockList), ".h5")))
  h5lock <- setArchRLocking()
  if (h5lock) {
    if (threads > 1) {
      message("subThreading Disabled since ArchRLocking is TRUE see `addArchRLocking`")
      threads <- 1
    }
  } else {
    if (threads > 1) {
      message("subThreading Enabled since ArchRLocking is FALSE see `addArchRLocking`")
    }
  }

  rD <- getReducedDims(ArchRProj = ArchRProj, reducedDims = reducedDims,
                       corCutOff = corCutOff, dimsToUse = dimsToUse)
  outDir1 <- getOutputDirectory(ArchRProj)
  outDir2 <- file.path(outDir1, "RNAIntegration")
  outDir3 <- file.path(outDir2, matrixName)
  dir.create(outDir1, showWarnings = FALSE)
  dir.create(outDir2, showWarnings = FALSE)
  dir.create(outDir3, showWarnings = FALSE)
  prevFiles <- list.files(outDir3, full.names = TRUE)
  prevFiles <- ArchR:::.suppressAll(file.remove(prevFiles))

  threads2 <- max(ceiling(threads * 0.75), 1)
  dfAll <- ArchR:::.safelapply(seq_along(blockList), function(i) {
    prefix <- sprintf("Block (%s of %s) :", i, length(blockList))
    blocki <- blockList[[i]]
    subProj@cellColData <- subProj@cellColData[blocki$ATAC, ]
    subProj@sampleColData <- subProj@sampleColData[unique(subProj$Sample), , drop = FALSE]
    subRNA <- seuratRNA[, blocki$RNA]
    subRNA <- subRNA[rownames(subRNA) %in% geneDF$name, ]
    subRNA <- FindVariableFeatures(object = subRNA, nfeatures = nGenes, verbose = FALSE)
    subRNA <- ScaleData(object = subRNA, verbose = FALSE)
    if (is.null(genesUse)) {
      genesUse <- VariableFeatures(object = subRNA)
    }

    mat <- ArchR:::.getPartialMatrix(getArrowFiles(subProj), featureDF = geneDF[geneDF$name %in% genesUse, ],
                                     threads = 2, cellNames = subProj$cellNames, useMatrix = useMatrix, verbose = FALSE)
    rownames(mat) <- geneDF[geneDF$name %in% genesUse, "name"]

    if (useImputation) {

      imputeParams <- list()
      imputeParams$ArchRProj <- subProj
      imputeParams$randomSuffix <- TRUE
      imputeParams$reducedDims <- reducedDims
      imputeParams$dimsToUse <- dimsToUse
      imputeParams$scaleDims <- scaleDims
      imputeParams$corCutOff <- corCutOff
      imputeParams$threads <- 1
      # imputeParams$logFile <- logFile
      subProj <- suppressMessages(do.call(addImputeWeights, imputeParams))
      mat <- suppressMessages(imputeMatrix(mat = mat, imputeWeights = getImputeWeights(subProj), verbose = FALSE))
      o <- suppressWarnings(file.remove(unlist(getImputeWeights(subProj)[[1]])))

    }

    mat <- log(mat + 1)
    colnames(mat) <- as.character(colnames(mat))
    rownames(mat) <- as.character(rownames(mat))

    seuratATAC <- Seurat::CreateSeuratObject(counts = mat[head(seq_len(nrow(mat)), 5), , drop = FALSE])
    seuratATAC[["GeneScore"]] <- Seurat::CreateAssayObject(counts = mat)
    rm(mat)
    DefaultAssay(seuratATAC) <- "GeneScore"
    seuratATAC <- Seurat::ScaleData(seuratATAC, verbose = FALSE)
    transferAnchors <- ArchR:::.retryCatch({
      gc()
      Seurat::FindTransferAnchors(reference = subRNA, query = seuratATAC, reduction = reduction,
                                  features = genesUse, verbose = FALSE, ...)
    }, maxAttempts = 2, logFile = logFile)

    rDSub <- rD[colnames(seuratATAC), , drop = FALSE]

    transferParams$anchorset <- transferAnchors
    transferParams$weight.reduction <- CreateDimReducObject(embeddings = rDSub, key = "LSI_", assay = DefaultAssay(seuratATAC))
    transferParams$verbose <- FALSE
    transferParams$dims <- seq_len(ncol(rDSub))

    transferParams$refdata <- subRNA$Group
    rnaLabels <- do.call(Seurat::TransferData, transferParams)

    transferParams$refdata <- colnames(subRNA)
    rnaLabels2 <- do.call(Seurat::TransferData, transferParams)[, 1]
    if (addToArrow) {

      transferParams$refdata <- GetAssayData(subRNA, assay = "RNA", slot = "data")
      gc()
      matchedRNA <- do.call(Seurat::TransferData, transferParams)
      matchedRNA <- matchedRNA@data
    }
    matchDF <- DataFrame(cellNames = colnames(seuratATAC),
                         predictionScore = rnaLabels$prediction.score.max,
                         predictedGroup = rnaLabels$predicted.id, predictedCell = rnaLabels2)
    rownames(matchDF) <- matchDF$cellNames

    jointCCA <- DataFrame([email protected][[1]]@[email protected])
    jointCCA$Assay <- ifelse(endsWith(rownames(jointCCA), "_reference"), "RNA", "ATAC")
    jointCCA$Group <- NA
    jointCCA$Score <- NA
    jointCCA[paste0(colnames(subRNA), "_reference"), "Group"] <- subRNA$Group
    jointCCA[paste0(matchDF$cellNames, "_query"), "Group"] <- matchDF$predictedGroup
    jointCCA[paste0(matchDF$cellNames, "_query"), "Score"] <- matchDF$predictionScore
    ArchR:::.safeSaveRDS(object = jointCCA, file = file.path(outDir3, paste0("Save-Block", i, "-JointCCA.rds")))
    rm(transferParams, transferAnchors)
    gc()
    if (addToArrow) {

      tmpFilei <- paste0(tmpFile, "-IntegrationBlock-", i, ".h5")
      o <- h5createFile(tmpFilei)
      sampleNames <- getCellColData(subProj, "Sample")[matchDF$cellNames, ]
      uniqueSamples <- unique(sampleNames)
      matchedRNA <- ArchR:::.safeSubset(mat = matchedRNA, subsetRows = paste0(featureDF$name), subsetCols = matchDF$cellNames)
      for (z in seq_along(uniqueSamples)) {
        mat <- matchedRNA[, which(sampleNames == uniqueSamples[z]), drop = FALSE]
        Group <- uniqueSamples[z]
        o <- tryCatch({
          h5delete(tmpFilei, paste0(Group))
        }, error = function(x) {
        })

        o <- h5createGroup(tmpFilei, paste0(Group))
        j <- Rle(findInterval(seq(mat@x) - 1, mat@p[-1]) + 1)
        lengthRle <- length(j@lengths)
        lengthI <- length(mat@i)
        o <- ArchR:::.suppressAll(h5createDataset(tmpFilei,
                                                  paste0(Group, "/i"), storage.mode = "integer",
                                                  dims = c(lengthI, 1), level = getArchRH5Level()))
        o <- ArchR:::.suppressAll(h5createDataset(tmpFilei,
                                                  paste0(Group, "/jLengths"), storage.mode = "integer",
                                                  dims = c(lengthRle, 1), level = getArchRH5Level()))
        o <- ArchR:::.suppressAll(h5createDataset(tmpFilei,
                                                  paste0(Group, "/jValues"), storage.mode = "integer",
                                                  dims = c(lengthRle, 1), level = getArchRH5Level()))
        o <- ArchR:::.suppressAll(h5createDataset(tmpFilei,
                                                  paste0(Group, "/x"), storage.mode = "double",
                                                  dims = c(lengthI, 1), level = getArchRH5Level()))
        o <- ArchR:::.suppressAll(h5write(obj = mat@i + 1, file = tmpFilei,
                                          name = paste0(Group, "/i")))
        o <- ArchR:::.suppressAll(h5write(obj = j@lengths, file = tmpFilei,
                                          name = paste0(Group, "/jLengths")))
        o <- ArchR:::.suppressAll(h5write(obj = j@values, file = tmpFilei,
                                          name = paste0(Group, "/jValues")))
        o <- ArchR:::.suppressAll(h5write(obj = mat@x, file = tmpFilei,
                                          name = paste0(Group, "/x")))
        o <- ArchR:::.suppressAll(h5write(obj = colnames(mat),
                                          file = tmpFilei, name = paste0(Group, "/cellNames")))
      }
      rm(matchedRNA, mat, j)
    }

    gc()
    matchDF$Block <- Rle(i)
    matchDF
  }, threads = threads2) %>% Reduce("rbind", .)

  if (plotUMAP) {
    for (i in seq_along(blockList)) {
      o <- tryCatch({
        prefix <- sprintf("Block (%s of %s) :", i, length(blockList))

        jointCCA <- readRDS(file.path(outDir3, paste0("Save-Block", i, "-JointCCA.rds")))
        set.seed(1)
        UMAPParams <- .mergeParams(UMAPParams, list(n_neighbors = 40, min_dist = 0.4, metric = "cosine", verbose = FALSE))
        UMAPParams$X <- as.data.frame(jointCCA[, grep("CC_", colnames(jointCCA))])
        UMAPParams$ret_nn <- FALSE
        UMAPParams$ret_model <- FALSE
        UMAPParams$n_threads <- 1
        uwotUmap <- tryCatch({
          do.call(uwot::umap, UMAPParams)
        }, error = function(e) {
          errorList <- UMAPParams
        })

        jointCCA$UMAP1 <- uwotUmap[, 1]
        jointCCA$UMAP2 <- uwotUmap[, 2]
        .safeSaveRDS(object = jointCCA, file = file.path(outDir3, paste0("Save-Block", i, "-JointCCA.rds")))
        p1 <- ggPoint(x = uwotUmap[, 1], y = uwotUmap[, 2], color = jointCCA$Assay, randomize = TRUE,
                      size = 0.2, title = paste0(prefix, " colored by Assay"),
                      xlabel = "UMAP Dimension 1", ylabel = "UMAP Dimension 2",
                      rastr = TRUE) + theme(axis.text.x = element_blank(),
                                            axis.ticks.x = element_blank(), axis.text.y = element_blank(),
                                            axis.ticks.y = element_blank())
        p2 <- ggPoint(x = uwotUmap[, 1], y = uwotUmap[, 2], color = jointCCA$Group, randomize = TRUE,
                      size = 0.2, title = paste0(prefix, " colored by scRNA Group"),
                      xlabel = "UMAP Dimension 1", ylabel = "UMAP Dimension 2",
                      rastr = TRUE) + theme(axis.text.x = element_blank(),
                                            axis.ticks.x = element_blank(), axis.text.y = element_blank(),
                                            axis.ticks.y = element_blank())
        pdf(file.path(outDir3, paste0("Save-Block", i, "-JointCCA-UMAP.pdf")), width = 12, height = 6,
            useDingbats = FALSE)
        ggAlignPlots(p1, p2, type = "h")
        dev.off()
      }, error = function(e) {
      })
    }
  }
  if (addToArrow) {
    matrixName <- ArchR:::.isProtectedArray(matrixName)
    integrationFiles <- paste0(tmpFile, "-IntegrationBlock-", seq_along(blockList), ".h5")
    if (!all(file.exists(integrationFiles))) {
      stop("Something went wrong with integration as not all temporary files containing integrated RNA exist!")
    }

    h5list <- ArchR:::.safelapply(seq_along(integrationFiles), function(x) {
      h5ls(integrationFiles[x])
    }, threads = threads)

    ArrowFiles <- getArrowFiles(ArchRProj)
    allSamples <- names(ArrowFiles)
    o <- .safelapply(seq_along(allSamples), function(y) {
      sample <- allSamples[y]
      prefix <- sprintf("%s (%s of %s)", sample, y, length(ArrowFiles))

      sampleIF <- lapply(seq_along(h5list), function(x) {
        if (any(h5list[[x]]$group == paste0("/", sample))) {
          integrationFiles[x]
        }
        else {
          NULL
        }
      }) %>% unlist
      sampleMat <- lapply(seq_along(sampleIF), function(x) {
        cellNames <- ArchR:::.h5read(sampleIF[x], paste0(sample, "/cellNames"))
        mat <- sparseMatrix(i = ArchR:::.h5read(sampleIF[x],
                                                paste0(sample, "/i"))[, 1], j = as.vector(Rle(ArchR:::.h5read(sampleIF[x],
                                                                                                              paste0(sample, "/jValues"))[, 1], ArchR:::.h5read(sampleIF[x],
                                                                                                                                                                paste0(sample, "/jLengths"))[, 1])), x = ArchR:::.h5read(sampleIF[x],
                                                                                                                                                                                                                         paste0(sample, "/x"))[, 1], dims = c(nrow(featureDF),
                                                                                                                                                                                                                                                              length(cellNames)))
        colnames(mat) <- cellNames
        mat
      }) %>% Reduce("cbind", .)
      sampleMat@x <- exp(sampleMat@x) - 1
      sampleMat <- ArchR:::.normalizeCols(sampleMat, scaleTo = scaleTo)
      sampleMat <- drop0(sampleMat)
      rownames(sampleMat) <- paste0(featureDF$name)
      sampleMat <- sampleMat[, ArchRProj$cellNames[BiocGenerics::which(ArchRProj$Sample == sample)], drop = FALSE]
      o <- ArchR:::.createArrowGroup(ArrowFile = ArrowFiles[sample],
                                     group = matrixName, force = force)
      o <- ArchR:::.initializeMat(ArrowFile = ArrowFiles[sample],
                                  Group = matrixName, Class = "double", Units = "NormCounts",
                                  cellNames = colnames(sampleMat), params = dfParams,
                                  featureDF = featureDF, force = force)
      o <- h5write(obj = dfAll[colnames(sampleMat), "predictionScore"],
                   file = ArrowFiles[sample], name = paste0(matrixName,
                                                            "/Info/predictionScore"))
      o <- h5write(obj = dfAll[colnames(sampleMat), "predictedGroup"],
                   file = ArrowFiles[sample], name = paste0(matrixName,
                                                            "/Info/predictedGroup"))
      o <- h5write(obj = dfAll[colnames(sampleMat), "predictedCell"],
                   file = ArrowFiles[sample], name = paste0(matrixName,
                                                            "/Info/predictedCell"))

      for (z in seq_along(allChr)) {
        chrz <- allChr[z]

        idz <- BiocGenerics::which(featureDF$seqnames %bcin% chrz)
        matz <- sampleMat[idz, , drop = FALSE]
        stopifnot(identical(paste0(featureDF$name[idz]),
                            paste0(rownames(matz))))
        o <- ArchR:::.addMatToArrow(mat = matz, ArrowFile = ArrowFiles[sample],
                                    Group = paste0(matrixName, "/", chrz), binarize = FALSE,
                                    addColSums = TRUE, addRowSums = TRUE, addRowVarsLog2 = TRUE,
                                    logFile = logFile)
        rm(matz)
        if (z%%3 == 0 | z == length(allChr)) {
          gc()
        }
      }
      0
    }, threads = threads)
    o <- suppressWarnings(file.remove(integrationFiles))
  }
  ArchRProj <- addCellColData(ArchRProj = ArchRProj, cells = dfAll$cellNames,
                              data = dfAll$predictedCell, name = nameCell, force = TRUE)
  ArchRProj <- addCellColData(ArchRProj = ArchRProj, cells = dfAll$cellNames,
                              data = dfAll$predictedGroup, name = nameGroup, force = TRUE)
  ArchRProj <- addCellColData(ArchRProj = ArchRProj, cells = dfAll$cellNames,
                              data = dfAll$predictionScore, name = nameScore, force = TRUE)
  return(ArchRProj)
}

Zepeng-Mu avatar Oct 29 '23 16:10 Zepeng-Mu

I encountered this problem too. ArchR 1.0.2

` > sessionInfo()

R version 4.3.2 (2023-10-31) Platform: x86_64-conda-linux-gnu (64-bit) Running under: Ubuntu 20.04.5 LTS

Matrix products: default BLAS/LAPACK: /root/miniconda3/envs/seurat/lib/libopenblasp-r0.3.25.so; LAPACK version 3.11.0

Random number generation: RNG: L'Ecuyer-CMRG Normal: Inversion Sample: Rejection

locale: [1] C

time zone: Etc/UTC tzcode source: system (glibc)

attached base packages: [1] parallel stats4 grid stats graphics grDevices utils [8] datasets methods base

other attached packages: [1] Seurat_5.0.1 SeuratObject_5.0.1 [3] sp_2.1-2 rhdf5_2.44.0 [5] SummarizedExperiment_1.30.2 Biobase_2.60.0 [7] MatrixGenerics_1.12.2 Rcpp_1.0.11 [9] Matrix_1.6-4 GenomicRanges_1.52.0 [11] GenomeInfoDb_1.36.1 IRanges_2.34.1 [13] S4Vectors_0.38.1 BiocGenerics_0.46.0 [15] matrixStats_1.0.0 data.table_1.14.8 [17] stringr_1.5.1 plyr_1.8.9 [19] magrittr_2.0.3 ggplot2_3.4.4 [21] gtable_0.3.4 gtools_3.9.5 [23] gridExtra_2.3 ArchR_1.0.2

loaded via a namespace (and not attached): [1] RColorBrewer_1.1-3 jsonlite_1.8.7 spatstat.utils_3.0-4 [4] zlibbioc_1.46.0 vctrs_0.6.5 ROCR_1.0-11 [7] Cairo_1.6-2 spatstat.explore_3.2-5 RCurl_1.98-1.12 [10] htmltools_0.5.7 S4Arrays_1.2.0 curl_5.1.0 [13] Rhdf5lib_1.22.0 SparseArray_1.2.2 sctransform_0.4.1 [16] parallelly_1.36.0 KernSmooth_2.23-22 desc_1.4.2 [19] htmlwidgets_1.6.3 ica_1.0-3 plotly_4.10.3 [22] zoo_1.8-12 igraph_1.5.1 mime_0.12 [25] lifecycle_1.0.4 pkgconfig_2.0.3 R6_2.5.1 [28] fastmap_1.1.1 GenomeInfoDbData_1.2.11 fitdistrplus_1.1-11 [31] future_1.33.0 shiny_1.8.0 digest_0.6.33 [34] colorspace_2.1-0 ps_1.7.5 patchwork_1.1.3 [37] rprojroot_2.0.4 tensor_1.5 RSpectra_0.16-1 [40] irlba_2.3.5.1 progressr_0.14.0 fansi_1.0.5 [43] spatstat.sparse_3.0-3 httr_1.4.7 polyclip_1.10-6 [46] abind_1.4-5 compiler_4.3.2 remotes_2.4.2.1 [49] withr_2.5.2 fastDummies_1.7.3 pkgbuild_1.4.2 [52] R.utils_2.12.3 MASS_7.3-60 DelayedArray_0.28.0 [55] tools_4.3.2 lmtest_0.9-40 httpuv_1.6.12 [58] future.apply_1.11.0 goftest_1.2-3 R.oo_1.25.0 [61] glue_1.6.2 callr_3.7.3 nlme_3.1-164 [64] rhdf5filters_1.12.1 promises_1.2.1 Rtsne_0.16 [67] cluster_2.1.6 reshape2_1.4.4 generics_0.1.3 [70] spatstat.data_3.0-3 R.methodsS3_1.8.2 tidyr_1.3.0 [73] utf8_1.2.4 XVector_0.40.0 spatstat.geom_3.2-7 [76] RcppAnnoy_0.0.21 ggrepel_0.9.4 RANN_2.6.1 [79] pillar_1.9.0 spam_2.10-0 RcppHNSW_0.5.0 [82] later_1.3.1 splines_4.3.2 dplyr_1.1.4 [85] lattice_0.22-5 survival_3.5-7 deldir_2.0-2 [88] tidyselect_1.2.0 miniUI_0.1.1.1 pbapply_1.7-2 [91] scattermore_1.2 stringi_1.8.2 lazyeval_0.2.2 [94] codetools_0.2-19 tibble_3.2.1 BiocManager_1.30.22 [97] cli_3.6.1 uwot_0.1.16 xtable_1.8-4 [100] reticulate_1.34.0 processx_3.8.2 munsell_0.5.0 [103] spatstat.random_3.2-2 globals_0.16.2 png_0.1-8 [106] ellipsis_0.3.2 prettyunits_1.2.0 dotCall64_1.1-1 [109] bitops_1.0-7 listenv_0.9.0 viridisLite_0.4.2 [112] scales_1.3.0 ggridges_0.5.4 leiden_0.4.3.1 [115] purrr_1.0.2 crayon_1.5.2 rlang_1.1.2 [118] cowplot_1.1.1 `

biobai avatar Dec 20 '23 10:12 biobai

Maybe it's the Seurat v5 causing this problem. I've encontered same problem with Seurat 5.0.1. I first convert my RNA assay from v5 to v3(or v4?) seRNA[["RNA4"]] <- as(object = seRNA[["RNA"]], Class = "Assay") seRNA[["RNA"]] <- seRNA[["RNA4"]] save the seurat object, and reload using Seurat 4.3.0 and ArchR, then addGeneIntegrationMatrix() goes well.

Maybe re-analyze your seRNA from scratch would help as well......

Shuting-Zhao avatar Dec 27 '23 01:12 Shuting-Zhao