scrnaseq icon indicating copy to clipboard operation
scrnaseq copied to clipboard

Update to the latest simpleaf

Open DongzeHE opened this issue 1 year ago • 5 comments

Description of feature

Dear scrnaseq team,

Thank you very much for including simpleaf in scrnaseq.

Recently, we made major changes to simpleaf, including adding new features and fixing bugs.

  • We introduced piscem, the latest indexer+mapper in our ecosystem that usually uses less memory and time compared with the pufferaligner provided in salmon. Piscem is now the default indexer+mapper in simpleaf.
  • Piscem has brought simpleaf many new features. The most exciting one is the ability to include decoy sequences in the index.
  • We completely reorganized our simpleaf workflow module and provided pre-built workflow templates for analyzing data from CITE-seq, 10X feature barcoding, etc.
  • We replaced the augmented reference constructor written in Python, pyroe, with our rust implementation, roers, so that simpleaf now does not depend on python anymore.
  • We fixed a bug related to downloading the whitelist from our online database.

As we noticed that currently, scrnaseq is using an old version of simpleaf, here we want to discuss the possibility of upgrading simpleaf to the latest version, and exposing the new features provided by the latest version.

Tagging @rob-p here in case I missed anything.

Best, Dongze

DongzeHE avatar Mar 05 '24 20:03 DongzeHE

Hi @DongzeHE,

we are of course happy to support the latest version of simpleaf and would appreciate a PR. As usual, it would be great to first update the module in nf-core/modules.

We completely reorganized our simpleaf workflow module and provided pre-built workflow templates for analyzing data from CITE-seq, 10X feature barcoding, etc.

Do you envisage any additional pipeline-level parameters would be needed to support that? Or do you think the --protocol parameter we already have is enough?

Best, Gregor

grst avatar Mar 06 '24 14:03 grst

Hi @grst,

Thanks for the reply! For the parameters, I think there are two ways to go:

  1. We can discuss which new parameters we should include. IMO there are two:
    • --decoy-paths in simpleaf_index: We can expose this parameter, or a parameter indicating if the provided genome file should be used as the decoy. Because of the way we designed the decoy, in the quant step, it is possible to not use the decoy part in the index, even if the decoy is used to build the index, by setting the --no-poison flag (maybe expose --no-poison as well?).
    • --no-piscem: As we support both piscem (default) and salmon as possible indexer/mapper, it would also be great if we could expose a switch.

Tagging @rob-p here in case I missed anything.

  1. We can expose all simpleaf options in a subsection of params, and assign the default value in simpleaf to them.

It would be great if you could provide some advice on which way we should go, exposing all options or only the most essential ones. Once we figure this out, I am very happy to work on this and submit a PR.

Best, Dongze

DongzeHE avatar Mar 06 '24 16:03 DongzeHE

sounds good. I think we should only expose the most frequently used options on the pipeline level (and those that require an additional input file). Users can still set arbitrary tool options via a config file, e.g. e.g.

process {
    withName: SIMPLEAF {
         ext.args = "--no-piscem"
    }
}

grst avatar Mar 07 '24 08:03 grst

Hi @DongzeHE,

I was wondering if there are any updates on this? I'd hope that updating Simpleaf will solve issues like #253.

grst avatar Aug 08 '24 08:08 grst

Hi @grst , I will create a pull request soon.

DongzeHE avatar Aug 08 '24 14:08 DongzeHE