modules icon indicating copy to clipboard operation
modules copied to clipboard

pbsv

Open tanyasarkjain opened this issue 1 year ago • 3 comments

PR checklist

Closes #XXX

  • [ ] This comment contains a description of changes (with reason).
  • [ ] If you've fixed a bug or added code that should be tested, add tests!
  • [ ] If you've added a new tool - have you followed the module conventions in the contribution docs
  • [ ] If necessary, include test data in your PR.
  • [] Remove all TODO statements.
  • [x] Emit the versions.yml file.
  • [x] Follow the naming conventions.
  • [x] Follow the parameters requirements.
  • [x] Follow the input/output options guidelines.
  • [ ] Add a resource label
  • [ ] Use BioConda and BioContainers if possible to fulfil software requirements.
  • Ensure that the test works with either Docker / Singularity. Conda CI tests can be quite flaky:
    • For modules:
      • [x] nf-core modules test <MODULE> --profile docker
      • [ ] nf-core modules test <MODULE> --profile singularity
      • [ ] nf-core modules test <MODULE> --profile conda
    • For subworkflows:
      • [ ] nf-core subworkflows test <SUBWORKFLOW> --profile docker
      • [ ] nf-core subworkflows test <SUBWORKFLOW> --profile singularity
      • [ ] nf-core subworkflows test <SUBWORKFLOW> --profile conda

tanyasarkjain avatar Oct 22 '24 07:10 tanyasarkjain

@nf-core-bot fix linting

tanyasarkjain avatar Oct 22 '24 07:10 tanyasarkjain

Should this perhaps be two modules, pbsv/discover and pbsv/call? Especially if pbsv discover is single-threaded while psbv call can use multiple threads. Do you have a sense for how long the discover step takes on real data?

fellen31 avatar Oct 25 '24 09:10 fellen31

I think typically discover and call are used together - I can't think of a scenario where you would just want to discover but not call, and for calling with pbsv you must discover first, so I think it makes sense to keep them together, both are needed for structural variant calling

tanyasarkjain avatar Oct 25 '24 17:10 tanyasarkjain

I think typically discover and call are used together - I can't think of a scenario where you would just want to discover but not call, and for calling with pbsv you must discover first, so I think it makes sense to keep them together, both are needed for structural variant calling

To align reads with pbmm2 you first need to run pbmm2 index to create an index, and then pbmm2 align to align the reads. They should still be made as separate modules (same as bwamem2 or minimap2). Partly because indexing uses a few cores for a couple of minutes, while aligning can take several hours. The align command might fail, while indexing worked, and by having separate modules you could resume the pipeline without having to re-index.

The cnvpytor modules are another example of where you would probably never run the subcommands alone, but they are still added as separate modules, and can still be run one after the other in a workflow.

fellen31 avatar Oct 27 '24 20:10 fellen31

Okay - I can work on making it into 2 modules

tanyasarkjain avatar Oct 27 '24 22:10 tanyasarkjain

Nice! A few comments, but thanks for making the effort to split this into two modules! ⭐

Thanks for the review!! - I believe I have resolved all your comments now - I also made a new modules pbsv/call

tanyasarkjain avatar Oct 30 '24 18:10 tanyasarkjain