Expected doublet rate question

Open s849 opened this issue 4 years ago • 1 comments

Hi,

Thank you for developing Scrublet. I am using a combined pipeline from your and Wagner's repositories and had a question about the expected doublet rate to identify putative doublets. I am calculating the expected doublet rate by simply taking the total amount of cells loaded onto chromium and using their formula to estimate the rate. In this example, the expected doublet rate from loading ~17,600 cells would be about 7.8%. To get this number I simply plotted their values on Excel, generated a simply formula and solved for the percentage. I was wondering if my approach seems correct to you? After this, I adjust the threshold to where the second distribution appears to show.

Below is the code; Thanks!

Identify and plot putative doublet cells

np.random.seed(802) # set random seed for reproducibility scrub = scr.Scrublet(adata.X, expected_doublet_rate=0.078) adata.obs['doublet_scores'], adata.obs['predicted_doublets'] = scrub.scrub_doublets(min_counts=2, min_cells=3, min_gene_variability_pctl=85, n_prin_comps=30) scrub.plot_histogram(); print("Doublet-like Cells = {:d}" .format(sum(adata.obs['predicted_doublets'])))

Mar 12 '20 19:03 s849

Hi @s849, apologies for the very delayed response. Your approach does sound correct to me. As an aside, the results shouldn't depend all that much on the input expected_doublet_rate, as long as you are setting the threshold between the two peaks of the distribution.

May 02 '20 00:05 swolock

scrublet scrublet copied to clipboard

Expected doublet rate question

Identify and plot putative doublet cells

scrublet
scrublet copied to clipboard