scrublet
scrublet copied to clipboard
Threshold setting on non bimodal simulated doublet score distribution
Hi there,
First of all thank your for this very powerful and useful tool. As I am looking for new quality control metrics, I tested scrublet and it works perfectly on most of my data (tested it on 207 lab single cell samples). However, 5 samples out of 207 pose me a problem, and i am reaching out to you to find explanation the results.
In this (very long, and I apologize in advance for that) post, I will be focusing on 1 of the 5 samples as an example.
My sample contains ~5000 cells, ATAC seq data. I used the filtered matrice from cellranger (no normalization no nothing).
When running scrublet with automated threshold setting I get a doublet % of 82.9.
Here is the simulated doublets distribution :
As you can tell, the automatic threshold setting is not working best (as you mentionned it in many other posts).
So for these particular samples I decided to set it manually to see how I can improve the % doublets.
After testing multiple values, the minimum % doublets I get is 13% [see new distribution below) :
Just now I got my doubletFinder results back, and I get a % doublets of 4.8% for the exact same sample.
What can I do to improve my scrublet results (other than thershold variation) ? How do you explain the fact that majority of embedding doublets vs. neotypic (the clusters are distinct from each other, see the umap below) ?
In this case, what do you recommand to me ?
sample type : mouse pituitary cells
Thank you in advance for your answer,
Best,
Lilia