scrnaseq icon indicating copy to clipboard operation
scrnaseq copied to clipboard

Call empty droplets

Open grst opened this issue 2 years ago • 1 comments

Is your feature request related to a problem? Please describe

I think it could be nice to have a step to distinguish empty droplets from actual cells.

As far as I know, Alevin/Kallisto only perform cell calling based on "knee plots", while cellranger implements the emptyDrops algorithm. According to the emptyDrops paper the method clearly outperforms filtering based on knee plots.

Describe the solution you'd like

Implement a process downstream of the aligner subworkflows running the emptyDrops algorithm.

Describe alternatives you've considered

This kind of filtering could be left to downstream pipelines such as #scflow. However, IMO, it would still make sense to have this as a default even when not using scflow for downstream analysis.

Additional context

STARsolo implements the emptydrops algorithm as of version 2.7.8a which can be activated using the --soloCellFilter EmptyDrops_CR option: https://github.com/alexdobin/STAR/blob/master/docs/STARsolo.md#emptydrop-like-filtering

I don't know if emptyDrops is still state-of-the-art of if there's something more advanced by now.

grst avatar Jan 01 '22 16:01 grst

See comments by Rob Patro on Slack: https://nfcore.slack.com/archives/CHN5BV5DW/p1643209151035000

grst avatar Jan 26 '22 15:01 grst

Is this related to what's happening in PR #153 ?

fmalmeida avatar Nov 10 '22 08:11 fmalmeida

Hi all- FYI there is an 'dropletutils-scripts' package in Bioconda that will provide a container to do this already.

The scripts themselves are here: https://github.com/ebi-gene-expression-group/dropletutils-scripts

Here's Nextflow process that uses it (admittedly DSL1). Just thought I'd point this out- might be a shortcut to a module.

pinin4fjords avatar Feb 02 '23 15:02 pinin4fjords

I will re-start the PR that has been on work on that since it was too problematic. And I will use the suggested: 'dropletutils-scripts' from bioconductor.

Using this, we were able to have it in a private copy of the pipeline. I can then work in bringing the script and the module to here.

fmalmeida avatar Aug 31 '23 11:08 fmalmeida

I have a PoC running for the main aligners. Once I figure it out for cellrangerarc I will open a PR with the whole description of the changes for discussion.

fmalmeida avatar Feb 12 '24 09:02 fmalmeida

PR merged 😄

fmalmeida avatar Mar 18 '24 17:03 fmalmeida