scrublet icon indicating copy to clipboard operation
scrublet copied to clipboard

Threshold setting on non bimodal simulated doublet score distribution

Open liliay opened this issue 2 years ago • 0 comments

Hi there,

First of all thank your for this very powerful and useful tool. As I am looking for new quality control metrics, I tested scrublet and it works perfectly on most of my data (tested it on 207 lab single cell samples). However, 5 samples out of 207 pose me a problem, and i am reaching out to you to find explanation the results.

In this (very long, and I apologize in advance for that) post, I will be focusing on 1 of the 5 samples as an example. My sample contains ~5000 cells, ATAC seq data. I used the filtered matrice from cellranger (no normalization no nothing). When running scrublet with automated threshold setting I get a doublet % of 82.9. Here is the simulated doublets distribution : image

As you can tell, the automatic threshold setting is not working best (as you mentionned it in many other posts). So for these particular samples I decided to set it manually to see how I can improve the % doublets. After testing multiple values, the minimum % doublets I get is 13% [see new distribution below) : image

Just now I got my doubletFinder results back, and I get a % doublets of 4.8% for the exact same sample.

What can I do to improve my scrublet results (other than thershold variation) ? How do you explain the fact that majority of embedding doublets vs. neotypic (the clusters are distinct from each other, see the umap below) ? image In this case, what do you recommand to me ?

sample type : mouse pituitary cells

Thank you in advance for your answer,

Best,

Lilia

liliay avatar Mar 30 '22 15:03 liliay