modules
modules copied to clipboard
Adding new module smoothxg
Fixes #1133
PR checklist
Closes #XXX
- [ ] This comment contains a description of changes (with reason).
- [ ] If you've fixed a bug or added code that should be tested, add tests!
- [ ] If you've added a new tool - have you followed the module conventions in the contribution docs
- [ ] If necessary, include test data in your PR.
- [ ] Remove all TODO statements.
- [ ] Emit the
versions.yml
file. - [ ] Follow the naming conventions.
- [ ] Follow the parameters requirements.
- [ ] Follow the input/output options guidelines.
- [ ] Add a resource
label
- [ ] Use BioConda and BioContainers if possible to fulfil software requirements.
- Ensure that the test works with either Docker / Singularity. Conda CI tests can be quite flaky:
- [ ]
PROFILE=docker pytest --tag <MODULE> --symlink --keep-workflow-wd
- [ ]
PROFILE=singularity pytest --tag <MODULE> --symlink --keep-workflow-wd
- [ ]
PROFILE=conda pytest --tag <MODULE> --symlink --keep-workflow-wd
- [ ]
process smoothxg
in nf-core/pangenome provides a lot more functionality than this minimal module does; optional consensus graph and MAF outputs, running more than one iteration, and much configuration. We'll need to determine how much of that goes in a shared module and how much is specific to the nf-core/pangenome workflow. Although to be honest that would be the only target workflow at this point.
As the pangenome workflow is the only target workflow, I would vote to design this module for it. I would suggest to also add optional outputs for the consensus graphs and the MAF file. See https://github.com/nf-core/pangenome/blob/676b05103c4d3fc6cb448b1d74eb3520389aa495/main.nf#L171.
Anyhow, this should wait for https://github.com/nf-core/pangenome/blob/676b05103c4d3fc6cb448b1d74eb3520389aa495/main.nf#L171.
Hi @heuermh, thank you for this PR! We are merging as many modules as possible now due to and impending restructuring of the entire repo. This will mean you will need to update the module to reflect these changes before it can be merged in the future. It appears like this module isn't ready to be merged so if applicable, we are converting it to draft and adding the WIP label. If this isn't the case please let us know and we will try to get the module in before the changes. Thanks again!
Recurse does not work, because we also emit the optional MAF output in the tests which is not declared as input for the module.
So we would have to recurse n-1 times. Then we have to invoke the module one last time setting the MAF output. Would that be possible?
https://github.com/pangenome/smoothxg/releases/tag/v0.6.7 new version incoming!
I don't know if there are any other solutions @heuermh? Edge cases are always a huge pain in Nextflow.
I would vote to just stay in bash
for this particular module.
EDIT: I missed the slack thread with @maxulysse so we still might be able to find a solution :)
Alright @heuermh. There are 3 ways forward:
- We ignore the nf-core rules and just blindly copy the bash script from PGGB.
- We write 2 modules. The first one is ignoring the nf-core modules and does not output a versions.yml. The second one works as the smoothxg module works now.
- As suggested by @ggabernet modify the tool smoothxg itself so that the iterations are possible when invoking it from the command line. I talked to @AndreaGuarracino and @ekg and this is a possible way forward. If we are lucky, we save I/O.
I would pick number 3, but it will take some coding and testing from my side. Stay tuned!
By option 1 you mean the for loop in bash from earlier? That could work. I tried option 2 a few different ways without success. I would also prefer option 3.
Yes, I meant the bash loop from earlier. Hopefully option 3 will fly in next week. Else we can just push forward with option 1 and correct later.
If option 1 is already doable and faster (no changes and new tests in smoothxg
), I would go in that direction ,and improve this aspect later. I would like to achieve a complete nf-core porting of pggb
asap, in-sync with its github/docker flavors (bioconda is almost there too, but not at full 100% power yet).
Alright @heuermh. I sank quite some time into option 3 https://github.com/pangenome/smoothxg/tree/iter, but the way forward does not seem trivial. @AndreaGuarracino and I will take a closer look when we get the chance. In the meanwhile, I will push option 1.
Trying something new, I've moved the existing module to SMOOTHXG_ONCE
and created a new one SMOOTHXG_ITERATE
with a for loop in bash.
It works for me on the test data with iterations = 2
but fails at iterations = 3
. Are there more arguments necessary to have iterations work with smoothxg?
Also still todo would be figuring out how to specify iterations
and to flesh out the test assertions. I think now the test files all clobber one another.
I think it's too tedious to split it into 2 components. That's why @AndreaGuarracino and I have https://github.com/pangenome/smoothxg/pull/174. Once this is tested, we can merge, make a tiny new release and just use it here. One module, one process, no questions asked.
Ah sweet! Nice work. We can roll back my offending commits when it is ready.
Alright, @AndreaGuarracino tested and I also did some small scale tests in https://github.com/nf-core/pangenome/pull/100.
Now we need to pray for a new smoothxg
release ;)
https://github.com/pangenome/smoothxg/pull/174#issuecomment-1311608084 stay tuned!
Reverted the split commit and updated to the latest container image.
@heuermh The new release is there! https://anaconda.org/bioconda/smoothxg. Thanks @AndreaGuarracino
I think this is good to go now. Pinging @heuermh @ggabernet @FriederikeHanssen to get one review. Thanks!
Just discovered I can do this on my own. Hehehe.
Thank you, @subwaystation!