Scissor Error Accessing 'data' Slot in Seurat Assay Object When Using Scissor Function

Hello,

I am experiencing an issue with the Scissor function in Seurat while analyzing single-cell RNA-seq data. I am trying to integrate my Seurat object with Scissor for further analysis, but I encounter an error related to accessing the 'data' slot in the Assay5 object.

Here's the code that leads to the error:

infos1 <- Scissor(bulk_dataset, sc_dataset, phenotype, alpha = 0.05, family = "cox", Save_file = 'Scissor_LUAD_survival.RData') Error in as.matrix(sc_dataset@assays$RNA@data) : no slot of name"data" for this object of class "Assay5"

str(sc_dataset) Formal class 'Seurat' [package "SeuratObject"] with 13 slots ..@ assays :List of 1 .. ..$ RNA:Formal class 'Assay5' [package "SeuratObject"] with 8 slots .. .. .. ..@ layers :List of 3 .. .. .. .. ..$ counts :Formal class 'dgCMatrix' [package "Matrix"] with 6 slots .. .. .. .. .. .. ..@ i : int [1:5841015] 2 38 46 49 54 103 114 122 123 130 ... .. .. .. .. .. .. ..@ p : int [1:4103] 0 298 1495 2854 3616 7487 7852 11819 13341 13870 ... .. .. .. .. .. .. ..@ Dim : int [1:2] 5857 4102 .. .. .. .. .. .. ..@ Dimnames:List of 2 .. .. .. .. .. .. .. ..$ : NULL .. .. .. .. .. .. .. ..$ : NULL .. .. .. .. .. .. ..@ x : num [1:5841015] 2 1 1 1 1 3 1 1 1 1 ... .. .. .. .. .. .. ..@ factors : list() .. .. .. .. ..$ data :Formal class 'dgCMatrix' [package "Matrix"] with 6 slots .. .. .. .. .. .. ..@ i : int [1:5841015] 2 38 46 49 54 103 114 122 123 130 ... .. .. .. .. .. .. ..@ p : int [1:4103] 0 298 1495 2854 3616 7487 7852 11819 13341 13870 ... .. .. .. .. .. .. ..@ Dim : int [1:2] 5857 4102 .. .. .. .. .. .. ..@ Dimnames:List of 2 .. .. .. .. .. .. .. ..$ : NULL .. .. .. .. .. .. .. ..$ : NULL .. .. .. .. .. .. ..@ x : num [1:5841015] 3.53 2.87 2.87 2.87 2.87 ... .. .. .. .. .. .. ..@ factors : list() .. .. .. .. ..$ scale.data: num [1:2000, 1:4102] -0.44 0.798 -0.491 -0.266 -0.552 ...

Nov 25 '23 06:11 lily424

You can use this function instead of the original Scissor function. The reason for this error is that Seurat v5 updated its object structure.

Scissor_m=function (bulk_dataset, sc_dataset, phenotype, tag = NULL, alpha = NULL, 
          cutoff = 0.2, family = c("gaussian", "binomial", "cox"), 
          Save_file = "Scissor_inputs.RData", Load_file = NULL) 
{
  library(Seurat)
  library(Matrix)
  library(preprocessCore)
  if (is.null(Load_file)) {
    common <- intersect(rownames(bulk_dataset), rownames(sc_dataset))
    if (length(common) == 0) {
      stop("There is no common genes between the given single-cell and bulk samples.")
    }
    if (class(sc_dataset) == "Seurat") {
      sc_exprs <- as.matrix(sc_dataset@assays$RNA$data)
      network <- as.matrix(sc_dataset@graphs$RNA_snn)
    }
    else {
      sc_exprs <- as.matrix(sc_dataset)
      Seurat_tmp <- CreateSeuratObject(sc_dataset)
      Seurat_tmp <- FindVariableFeatures(Seurat_tmp, selection.method = "vst", 
                                         verbose = F)
      Seurat_tmp <- ScaleData(Seurat_tmp, verbose = F)
      Seurat_tmp <- RunPCA(Seurat_tmp, features = VariableFeatures(Seurat_tmp), 
                           verbose = F)
      Seurat_tmp <- FindNeighbors(Seurat_tmp, dims = 1:10, 
                                  verbose = F)
      network <- as.matrix(Seurat_tmp@graphs$RNA_snn)
    }
    diag(network) <- 0
    network[which(network != 0)] <- 1
    dataset0 <- cbind(bulk_dataset[common, ], sc_exprs[common, 
    ])
    dataset1 <- normalize.quantiles(dataset0)
    rownames(dataset1) <- rownames(dataset0)
    colnames(dataset1) <- colnames(dataset0)
    Expression_bulk <- dataset1[, 1:ncol(bulk_dataset)]
    Expression_cell <- dataset1[, (ncol(bulk_dataset) + 1):ncol(dataset1)]
    X <- cor(Expression_bulk, Expression_cell)
    quality_check <- quantile(X)
    print("|**************************************************|")
    print("Performing quality-check for the correlations")
    print("The five-number summary of correlations:")
    print(quality_check)
    print("|**************************************************|")
    if (quality_check[3] < 0.01) {
      warning("The median correlation between the single-cell and bulk samples is relatively low.")
    }
    if (family == "binomial") {
      Y <- as.numeric(phenotype)
      z <- table(Y)
      if (length(z) != length(tag)) {
        stop("The length differs between tags and phenotypes. Please check Scissor inputs and selected regression type.")
      }
      else {
        print(sprintf("Current phenotype contains %d %s and %d %s samples.", 
                      z[1], tag[1], z[2], tag[2]))
        print("Perform logistic regression on the given phenotypes:")
      }
    }
    if (family == "gaussian") {
      Y <- as.numeric(phenotype)
      z <- table(Y)
      if (length(z) != length(tag)) {
        stop("The length differs between tags and phenotypes. Please check Scissor inputs and selected regression type.")
      }
      else {
        tmp <- paste(z, tag)
        print(paste0("Current phenotype contains ", paste(tmp[1:(length(z) - 
                                                                   1)], collapse = ", "), ", and ", tmp[length(z)], 
                     " samples."))
        print("Perform linear regression on the given phenotypes:")
      }
    }
    if (family == "cox") {
      Y <- as.matrix(phenotype)
      if (ncol(Y) != 2) {
        stop("The size of survival data is wrong. Please check Scissor inputs and selected regression type.")
      }
      else {
        print("Perform cox regression on the given clinical outcomes:")
      }
    }
    save(X, Y, network, Expression_bulk, Expression_cell, 
         file = Save_file)
  }
  else {
    load(Load_file)
  }
  if (is.null(alpha)) {
    alpha <- c(0.005, 0.01, 0.05, 0.1, 0.2, 0.3, 0.4, 0.5, 
               0.6, 0.7, 0.8, 0.9)
  }
  for (i in 1:length(alpha)) {
    set.seed(123)
    fit0 <- APML1(X, Y, family = family, penalty = "Net", 
                  alpha = alpha[i], Omega = network, nlambda = 100, 
                  nfolds = min(10, nrow(X)))
    fit1 <- APML1(X, Y, family = family, penalty = "Net", 
                  alpha = alpha[i], Omega = network, lambda = fit0$lambda.min)
    if (family == "binomial") {
      Coefs <- as.numeric(fit1$Beta[2:(ncol(X) + 1)])
    }
    else {
      Coefs <- as.numeric(fit1$Beta)
    }
    Cell1 <- colnames(X)[which(Coefs > 0)]
    Cell2 <- colnames(X)[which(Coefs < 0)]
    percentage <- (length(Cell1) + length(Cell2))/ncol(X)
    print(sprintf("alpha = %s", alpha[i]))
    print(sprintf("Scissor identified %d Scissor+ cells and %d Scissor- cells.", 
                  length(Cell1), length(Cell2)))
    print(sprintf("The percentage of selected cell is: %s%%", 
                  formatC(percentage * 100, format = "f", digits = 3)))
    if (percentage < cutoff) {
      break
    }
    cat("\n")
  }
  print("|**************************************************|")
  return(list(para = list(alpha = alpha[i], lambda = fit0$lambda.min, 
                          family = family), Coefs = Coefs, Scissor_pos = Cell1, 
              Scissor_neg = Cell2))
}

Mar 14 '24 10:03 yulijia

I am using scissor to integrate bulk and single cell data of TNBC samples on the basis of survival. But if I am taking basal samples of TCGA(with all the genes) I am getting somewhat good scissor results(scissor plus cells in the expected cell types like epithelial). But not all basal cells are TNBC. So when I filtered TNBC samples of same TCGA dataset from the IHC status (this time took only protein coding genes) , I am getting wrong scissor results(scissor plus cells in normal immune & stromal cells but not in epithelial cells). The bulk datasets are log2(tpm+1).

I do not know whats going wrong. I am using the scissor function given here.I have attached the summery of the dataset.

Size & Coverage

IHC TNBC: 100 samples, 19,938 genes, 17,524 genes common with single-cell Basal TNBC: 194 samples, 59,427 genes, 20,328 genes common with single-cell Single-cell: 30,133 cells, 29,733 genes

Expression Characteristics

IHC TNBC: Dense expression (0-14 range), mean = 3.6, mostly non-zero values Basal TNBC: Sparse expression (0-11 range), mean = 0.6, many zeros Single-cell: Very sparse expression (0-2.1 range), mean = 0.086, mostly zeros

Sample Heterogeneity

IHC: Coefficient of variation = 0.24 (homogeneous samples) Basal: Coefficient of variation = 0.79 (heterogeneous samples)

Survival Data

IHC: 14 events out of 100 samples (14% event rate) Basal: 29 events out of 194 samples (15% event rate)

Jul 29 '25 05:07 sushree-sangita-kar