DoubletFinder
DoubletFinder copied to clipboard
Clear doublets in data after running DoubletFinder
This is the code I ran on my data:
## spits out the expected multiplet rate based on input value of library size (data taken from 10x site)
get.10x.multiplets <- function(x){
expected.df <- data.frame(
"Cells.Recovered" = c(500,1000,2000,3000,4000,5000,6000,7000,8000,9000,10000),
"Multiplet.Rate" = c(0.4, 0.8, 1.6, 2.4, 3.2, 4.0, 4.8, 5.6, 6.4, 7.2, 8.0)
)
lm.res <- lm(Multiplet.Rate~Cells.Recovered, data = expected.df)
test.df <- data.frame("Cells.Recovered" = x)
return(predict(lm.res, test.df))
}
## runs DoubletFinder on a seurat object
get.10x.doublets <- function(x){
require(DoubletFinder)
x <- NormalizeData(x)
x <- FindVariableFeatures(x, selection.method = "vst", nfeatures = 500)
x <- ScaleData(x, features = VariableFeatures(x))
x <- RunPCA(x, features = VariableFeatures(x), verbose = FALSE)
x <- RunUMAP(x, reduction = "pca", dims = 1:20, reduction.key = "UMAPpca_", reduction.name = "UMAP_PCA")
x <- FindNeighbors(x, reduction = "pca", dims = 1:20)
x <- FindClusters(x, resolution = 0.3)
x.sweep <- paramSweep(x, PCs = 1:20, sct = FALSE)
x.sweep.stats <- summarizeSweep(x.sweep, GT = FALSE)
x.bcmvn <- find.pK(x.sweep.stats)
x.pK <- as.numeric(as.character(x.bcmvn$pK[which.max(x.bcmvn$BCmetric)]))
homotypic.prop <- modelHomotypic(x$seurat_clusters)
mlt.est <- get.10x.multiplets(nrow([email protected]))/100
nExp_poi <- round(mlt.est*nrow([email protected]))
nExp_poi.adj <- round(nExp_poi*(1-homotypic.prop))
x <- doubletFinder(x, PCs = 1:20, pN = 0.25, pK = x.pK, nExp = nExp_poi, sct = FALSE, reuse.pANN = FALSE)
x <- doubletFinder(x, PCs = 1:20, pN = 0.25, pK = x.pK, nExp = nExp_poi.adj, sct = FALSE,
reuse.pANN = grep("pANN", colnames([email protected]), value = TRUE))
doublet.df <- [email protected][,grep("DF.classifications", colnames([email protected]), value = TRUE)]
colnames(doublet.df) <- c("Estimate", "Adj.Estimate")
doublet.df$Cellname <- rownames(doublet.df)
return(doublet.df)
}
After filtering for Adj.Estimate=="Singlet" in my dataset, I can identify populations of cells that shouldn't exist - eg marker combinations that shouldn't be possible in immune cells.
When I look at the cluster of cells (cluster 7 on the violin plot) with these markers they appear to have significantly higher nCount and nFeature scores
Can you suggest if there is something wrong with the code I am running?