scrnaseq icon indicating copy to clipboard operation
scrnaseq copied to clipboard

addition of emptyDrops draftdownstream modules

Open rlinchangco opened this issue 2 years ago • 2 comments

This is a draft for adding emptyDrops functionality to the pipeline, using a python ported implementation of the R package: https://github.com/nh3/emptydrops

New additions include: empty_drops.nf empty_drops_filter.nf filter_by_emptydrops.py

The two things that definitely still need work are in empty_drops_filter.nf is to check the inputs to be sure it's the data directory path, and to add an output directory path

PR checklist

  • [ ] This comment contains a description of changes (with reason).
  • [ ] If you've fixed a bug or added code that should be tested, add tests!
    • [ ] If you've added a new tool - have you followed the pipeline conventions in the contribution docs
    • [ ] If necessary, also make a PR on the nf-core/scrnaseq branch on the nf-core/test-datasets repository.
  • [ ] Make sure your code lints (nf-core lint).
  • [ ] Ensure the test suite passes (nextflow run . -profile test,docker --outdir <OUTDIR>).
  • [ ] Usage Documentation in docs/usage.md is updated.
  • [ ] Output Documentation in docs/output.md is updated.
  • [ ] CHANGELOG.md is updated.
  • [ ] README.md is updated (including new tool citations and authors/contributors).

rlinchangco avatar Oct 05 '22 13:10 rlinchangco

@fmalmeida Here is the draft we disscussed!

rlinchangco avatar Oct 05 '22 13:10 rlinchangco

nf-core lint overall result: Passed :white_check_mark:

Posted for pipeline commit ae8fb07

+| ✅ 158 tests passed       |+

:white_check_mark: Tests passed:

Run details

  • nf-core/tools version 2.6
  • Run at 2022-11-09 14:54:47

github-actions[bot] avatar Oct 05 '22 13:10 github-actions[bot]

Hi @rlinchangco, I am planning to help you on this one. Can you add me as a contributor to your fork so I can commit to it and help on the PR?

fmalmeida avatar Nov 09 '22 11:11 fmalmeida

General question, what's the plan for the implementation?

EMPTYDROPS_FILTER should come right before matrices conversion and concatenation? Shall it be always on? Or have a params.filter_emptydrops parameter for it?

fmalmeida avatar Nov 09 '22 14:11 fmalmeida

@nf-core-bot fix linting

fmalmeida avatar Nov 09 '22 14:11 fmalmeida

The script expects a parameter called --version which is defined like this:

parser.add_argument(
        "-v",
        "--version",
        dest="version",
        default=3,
        help="Version for features file, defines expected file name (1=kallisto,<v3 cellranger,3=star,>=v3 cellranger).",
    )

I could actually not understand what it expects. It would not work for salmon/alevin? Guess to make it easier to use values already defined on parameters and meta.maps, instead of having a different definition, may this option ask for the aligner name directly? Or the actual protocol?

fmalmeida avatar Nov 09 '22 14:11 fmalmeida

Hi all- FYI there is an 'dropletutils-scripts' package in ~~Bioconductor~~Bioconda that will provide a container to do this already.

The scripts themselves are here: https://github.com/ebi-gene-expression-group/dropletutils-scripts

Here's Nextflow process that uses it (admittedly DSL1). Just thought I'd point this out- might be a shortcut to a module.

Edit: just corrected the above (I meant Bioconda, not Bioconductor, so there's already a Biocontainer)

pinin4fjords avatar Feb 02 '23 15:02 pinin4fjords

Hi @rlinchangco, I saw the modules and re-adapted how it is called in NF. However, I saw the script and the dependencies seem to be complex in a way that multiple conda envs are required, thus we need a mulled container (I already opened a PR for that).

However, we would still lack the https://pypi.org/project/biomart/ package that you are using but does not seen to be available in conda. If that's the case, unfortunately we cannot add the module.

You think the scripts Jonathan mentioned in the previous comment would be of help here?

fmalmeida avatar Feb 17 '23 16:02 fmalmeida

I will open a new PR using another package 'dropletutils-scripts' since this worked well in another private copy. I will add this in the following weeks.

fmalmeida avatar Aug 31 '23 11:08 fmalmeida

Do we need this for 10x? I thought cellranger kind of already uses the same algorithm internally?

grst avatar Aug 31 '23 11:08 grst

As per the issue that raised the PR, it is more so that the other aligners also have a result with that.

fmalmeida avatar Aug 31 '23 11:08 fmalmeida

ok, fair enough

grst avatar Aug 31 '23 11:08 grst

this is the one https://github.com/nf-core/scrnaseq/issues/81

😄 🎯

fmalmeida avatar Aug 31 '23 11:08 fmalmeida