DoubletFinder icon indicating copy to clipboard operation
DoubletFinder copied to clipboard

Clear doublets in data after running DoubletFinder

Open wudustan opened this issue 1 week ago • 0 comments

This is the code I ran on my data:

## spits out the expected multiplet rate based on input value of library size (data taken from 10x site)
get.10x.multiplets <- function(x){
  expected.df <- data.frame(
    "Cells.Recovered" = c(500,1000,2000,3000,4000,5000,6000,7000,8000,9000,10000),
    "Multiplet.Rate" = c(0.4, 0.8, 1.6, 2.4, 3.2, 4.0, 4.8, 5.6, 6.4, 7.2, 8.0)
  )
  lm.res <- lm(Multiplet.Rate~Cells.Recovered, data = expected.df)
  test.df <- data.frame("Cells.Recovered" = x)
  return(predict(lm.res, test.df))
}

## runs DoubletFinder on a seurat object 
get.10x.doublets <- function(x){
  require(DoubletFinder)

  x <- NormalizeData(x)
  x <- FindVariableFeatures(x, selection.method = "vst", nfeatures = 500)
  x <- ScaleData(x, features = VariableFeatures(x))
  x <- RunPCA(x, features = VariableFeatures(x), verbose = FALSE)
  x <- RunUMAP(x, reduction = "pca", dims = 1:20, reduction.key = "UMAPpca_", reduction.name = "UMAP_PCA")
  x <- FindNeighbors(x, reduction = "pca", dims = 1:20)
  x <- FindClusters(x, resolution = 0.3)

  x.sweep <- paramSweep(x, PCs = 1:20, sct = FALSE)
  x.sweep.stats <- summarizeSweep(x.sweep, GT = FALSE)
  x.bcmvn <- find.pK(x.sweep.stats)
  x.pK <- as.numeric(as.character(x.bcmvn$pK[which.max(x.bcmvn$BCmetric)]))
  homotypic.prop <- modelHomotypic(x$seurat_clusters)
  mlt.est <- get.10x.multiplets(nrow([email protected]))/100
  nExp_poi <- round(mlt.est*nrow([email protected]))
  nExp_poi.adj <- round(nExp_poi*(1-homotypic.prop))

  x <- doubletFinder(x, PCs = 1:20, pN = 0.25, pK = x.pK, nExp = nExp_poi, sct = FALSE, reuse.pANN = FALSE)
  x <- doubletFinder(x, PCs = 1:20, pN = 0.25, pK = x.pK, nExp = nExp_poi.adj, sct = FALSE, 
                     reuse.pANN = grep("pANN", colnames([email protected]), value = TRUE))
  doublet.df <- [email protected][,grep("DF.classifications", colnames([email protected]), value = TRUE)]
  colnames(doublet.df) <- c("Estimate", "Adj.Estimate")
  doublet.df$Cellname <- rownames(doublet.df)
  return(doublet.df)
}

After filtering for Adj.Estimate=="Singlet" in my dataset, I can identify populations of cells that shouldn't exist - eg marker combinations that shouldn't be possible in immune cells.

When I look at the cluster of cells (cluster 7 on the violin plot) with these markers they appear to have significantly higher nCount and nFeature scores

Image

Can you suggest if there is something wrong with the code I am running?

wudustan avatar Feb 21 '25 15:02 wudustan