scrublet
scrublet copied to clipboard
Expected doublet rate question
Hi,
Thank you for developing Scrublet. I am using a combined pipeline from your and Wagner's repositories and had a question about the expected doublet rate to identify putative doublets. I am calculating the expected doublet rate by simply taking the total amount of cells loaded onto chromium and using their formula to estimate the rate. In this example, the expected doublet rate from loading ~17,600 cells would be about 7.8%. To get this number I simply plotted their values on Excel, generated a simply formula and solved for the percentage. I was wondering if my approach seems correct to you? After this, I adjust the threshold to where the second distribution appears to show.
Below is the code; Thanks!
Identify and plot putative doublet cells
np.random.seed(802) # set random seed for reproducibility scrub = scr.Scrublet(adata.X, expected_doublet_rate=0.078) adata.obs['doublet_scores'], adata.obs['predicted_doublets'] = scrub.scrub_doublets(min_counts=2, min_cells=3, min_gene_variability_pctl=85, n_prin_comps=30) scrub.plot_histogram(); print("Doublet-like Cells = {:d}" .format(sum(adata.obs['predicted_doublets'])))
Hi @s849, apologies for the very delayed response. Your approach does sound correct to me. As an aside, the results shouldn't depend all that much on the input expected_doublet_rate
, as long as you are setting the threshold between the two peaks of the distribution.