modules icon indicating copy to clipboard operation
modules copied to clipboard

Adding new module smoothxg

Open heuermh opened this issue 3 years ago • 3 comments

Fixes #1133

PR checklist

Closes #XXX

  • [ ] This comment contains a description of changes (with reason).
  • [ ] If you've fixed a bug or added code that should be tested, add tests!
  • [ ] If you've added a new tool - have you followed the module conventions in the contribution docs
  • [ ] If necessary, include test data in your PR.
  • [ ] Remove all TODO statements.
  • [ ] Emit the versions.yml file.
  • [ ] Follow the naming conventions.
  • [ ] Follow the parameters requirements.
  • [ ] Follow the input/output options guidelines.
  • [ ] Add a resource label
  • [ ] Use BioConda and BioContainers if possible to fulfil software requirements.
  • Ensure that the test works with either Docker / Singularity. Conda CI tests can be quite flaky:
    • [ ] PROFILE=docker pytest --tag <MODULE> --symlink --keep-workflow-wd
    • [ ] PROFILE=singularity pytest --tag <MODULE> --symlink --keep-workflow-wd
    • [ ] PROFILE=conda pytest --tag <MODULE> --symlink --keep-workflow-wd

heuermh avatar Dec 09 '21 19:12 heuermh

process smoothxg in nf-core/pangenome provides a lot more functionality than this minimal module does; optional consensus graph and MAF outputs, running more than one iteration, and much configuration. We'll need to determine how much of that goes in a shared module and how much is specific to the nf-core/pangenome workflow. Although to be honest that would be the only target workflow at this point.

heuermh avatar Dec 09 '21 19:12 heuermh

As the pangenome workflow is the only target workflow, I would vote to design this module for it. I would suggest to also add optional outputs for the consensus graphs and the MAF file. See https://github.com/nf-core/pangenome/blob/676b05103c4d3fc6cb448b1d74eb3520389aa495/main.nf#L171.

Anyhow, this should wait for https://github.com/nf-core/pangenome/blob/676b05103c4d3fc6cb448b1d74eb3520389aa495/main.nf#L171.

subwaystation avatar Dec 10 '21 13:12 subwaystation

Hi @heuermh, thank you for this PR! We are merging as many modules as possible now due to and impending restructuring of the entire repo. This will mean you will need to update the module to reflect these changes before it can be merged in the future. It appears like this module isn't ready to be merged so if applicable, we are converting it to draft and adding the WIP label. If this isn't the case please let us know and we will try to get the module in before the changes. Thanks again!

JoseEspinosa avatar Sep 28 '22 08:09 JoseEspinosa

Recurse does not work, because we also emit the optional MAF output in the tests which is not declared as input for the module.

So we would have to recurse n-1 times. Then we have to invoke the module one last time setting the MAF output. Would that be possible?

subwaystation avatar Oct 28 '22 07:10 subwaystation

https://github.com/pangenome/smoothxg/releases/tag/v0.6.7 new version incoming!

subwaystation avatar Oct 28 '22 09:10 subwaystation

I don't know if there are any other solutions @heuermh? Edge cases are always a huge pain in Nextflow. I would vote to just stay in bash for this particular module.

EDIT: I missed the slack thread with @maxulysse so we still might be able to find a solution :)

subwaystation avatar Nov 02 '22 09:11 subwaystation

Alright @heuermh. There are 3 ways forward:

  1. We ignore the nf-core rules and just blindly copy the bash script from PGGB.
  2. We write 2 modules. The first one is ignoring the nf-core modules and does not output a versions.yml. The second one works as the smoothxg module works now.
  3. As suggested by @ggabernet modify the tool smoothxg itself so that the iterations are possible when invoking it from the command line. I talked to @AndreaGuarracino and @ekg and this is a possible way forward. If we are lucky, we save I/O.

I would pick number 3, but it will take some coding and testing from my side. Stay tuned!

subwaystation avatar Nov 03 '22 12:11 subwaystation

By option 1 you mean the for loop in bash from earlier? That could work. I tried option 2 a few different ways without success. I would also prefer option 3.

heuermh avatar Nov 03 '22 21:11 heuermh

Yes, I meant the bash loop from earlier. Hopefully option 3 will fly in next week. Else we can just push forward with option 1 and correct later.

subwaystation avatar Nov 04 '22 08:11 subwaystation

If option 1 is already doable and faster (no changes and new tests in smoothxg), I would go in that direction ,and improve this aspect later. I would like to achieve a complete nf-core porting of pggb asap, in-sync with its github/docker flavors (bioconda is almost there too, but not at full 100% power yet).

AndreaGuarracino avatar Nov 04 '22 13:11 AndreaGuarracino

Alright @heuermh. I sank quite some time into option 3 https://github.com/pangenome/smoothxg/tree/iter, but the way forward does not seem trivial. @AndreaGuarracino and I will take a closer look when we get the chance. In the meanwhile, I will push option 1.

subwaystation avatar Nov 08 '22 15:11 subwaystation

Trying something new, I've moved the existing module to SMOOTHXG_ONCE and created a new one SMOOTHXG_ITERATE with a for loop in bash.

It works for me on the test data with iterations = 2 but fails at iterations = 3. Are there more arguments necessary to have iterations work with smoothxg?

Also still todo would be figuring out how to specify iterations and to flesh out the test assertions. I think now the test files all clobber one another.

heuermh avatar Nov 09 '22 20:11 heuermh

I think it's too tedious to split it into 2 components. That's why @AndreaGuarracino and I have https://github.com/pangenome/smoothxg/pull/174. Once this is tested, we can merge, make a tiny new release and just use it here. One module, one process, no questions asked.

subwaystation avatar Nov 10 '22 16:11 subwaystation

Ah sweet! Nice work. We can roll back my offending commits when it is ready.

heuermh avatar Nov 10 '22 16:11 heuermh

Alright, @AndreaGuarracino tested and I also did some small scale tests in https://github.com/nf-core/pangenome/pull/100. Now we need to pray for a new smoothxg release ;)

subwaystation avatar Nov 11 '22 10:11 subwaystation

https://github.com/pangenome/smoothxg/pull/174#issuecomment-1311608084 stay tuned!

subwaystation avatar Nov 11 '22 11:11 subwaystation

Reverted the split commit and updated to the latest container image.

heuermh avatar Nov 17 '22 19:11 heuermh

@heuermh The new release is there! https://anaconda.org/bioconda/smoothxg. Thanks @AndreaGuarracino

subwaystation avatar Dec 15 '22 08:12 subwaystation

I think this is good to go now. Pinging @heuermh @ggabernet @FriederikeHanssen to get one review. Thanks!

subwaystation avatar Jan 25 '23 15:01 subwaystation

Just discovered I can do this on my own. Hehehe.

subwaystation avatar Jan 25 '23 15:01 subwaystation

Thank you, @subwaystation!

heuermh avatar Jan 25 '23 19:01 heuermh