feat: Install subworkflows with modules from different remotes
PR checklist
- [x] This comment contains a description of changes (with reason)
- [ ]
CHANGELOG.mdis updated - [x] If you've fixed a bug or added code that should be tested, add tests!
- [ ] Documentation in
docsis updated
Addresses #1927 and #2497
Issue information
Currently, get_components_to_install is the function used to download a subworkflow's modules. It relies on simple text parsing to define where to get a subworkflow/module from and install it.
As outlined in the above issues (and in slack conversations), it is currently not possible to install subworkflows which depend on modules that originate from different remotes. The solution is to copy over required modules to a same remote in order to use them.
Implementation
At this moment the implementation is mostly a work-around/hack. I aimed to make the least possible modifications to the source code in order to make the intended behavior functional. Here's how it works:
-
It still uses the same text parsing for all modules and subworkflows that don't have a meta.yml file defined.
-
If a subworkflow defines a meta.yml, they can optionally define the
git_remoteandorg_pathkeys for a component. This is based on a suggestion given on the original issue. E.g.:
components:
- name: gatk4/mutect2
- name: gatk4/genomicsdbimport
org_path: custom_remote
git_remote: https://github.com/custom_remote/modules.git
This info is used to grab the respective modules from the specified remote. I've setup a dummy repository with a subworkflow that showcases this. If no remote is specified, it assumes the module is from the same remote as the subworkflow (as it is currently).
Trying to run nf-core subworkflows --git-remote https://github.com/jvfe/test-subworkflow-remote install get_genome_annotation with this branch of nf-core/tools should work. There's a basic test I've setup to test for this.
Known issues
- One of the hackier parts is that I'm having to redefine ModulesRepo from inside ComponentInstall. This is definitely not ideal, so I'd appreciate any suggestions on how to improve this logic.
All in all, I'm eager to hear what everyone thinks. I'm tagging @muffato who has been supervising and helping me in this implementation.
I'm very wary of increasing the idiosyncrasies in the code and I feel that making it such that the code returns variably either strings or dicts and thus necessitates isinstance checks is liable to be difficult to follow/trace through the code in future.
I understand the desire to simply get something working but I think that if we are going to implement this and support it we need to do it right.
I think this also indicates support only for cross-organisational modules and not cross-organisational subworkflows?
I think this also indicates support only for cross-organisational modules and not cross-organisational subworkflows?
Currently, yes
I'm very wary of increasing the idiosyncrasies in the code and I feel that making it such that the code returns variably either strings or dicts and thus necessitates isinstance checks is liable to be difficult to follow/trace through the code in future.
Once get_components_to_install is updated to return dictionaries for sub-workflows too, all the isinstance(..., dict) will evaluate to true, and all their if statements will be simplified / collapse.
I understand the desire to simply get something working but I think that if we are going to implement this and support it we need to do it right.
It's a question of time management (@jvfe is working part time on this project). We're not sure about about the handling of the default org vs the module org in nf_core/components/install.py (cf the creation of ModulesRepo instances). We didn't want to invest deeper without clearing that first. Making the change in get_components_to_install is easy, but finding the other files that need an update and making sure that all the tests still pass takes a bit longer.
Thank you so much for the review @awgymer and @mirpedrol. As @muffato said, I'm only dedicating a small amount of time every week to this project, hence the slow updates and the reason why I opted at first for a more "hacky" solution, it was all in order to get a "basic" first version out so the idea could be discussed. But rest assured I agree with all of the concerns pointed out here and I hope to address them with my next updates. The things I'm working on atm are:
- Using dictionaries as the only return type for this function
- Supporting subworkflows
- Inferring org_path from the remote url
- Removing isinstance checks (which will be made unnecessary by first point)
In terms of the code implemented here this looks good.
I am unclear about the impact on the following commands:
* `subworkflows remove` * `subworkflows update` * `subworkflows lint`I also still have concerns about this particular style of cross-repo (or indeed single-repo)
subworkflow(although I am not intending to reignite a debate over the alternative model I support in this MR!) and at the present stage I wouldn't recommend this as a true solution to anyone wanting cross-repo subworkflows because I'm unclear on what level of commitment to maintain and work around issues this introduces there is from annf-corestandpoint. (This isn't so much something for the authors of this PR to address but rather the widernf-coreteam)
From what I could test (and the unit tests themselves), there's no discernible impact to those commands. Also, I do understand your concerns and agree that there could be a more robust reworking of how cross-repo subworkflows could work (and how single-repo subworkflows are implemented themselves!). In this PR, though, I aimed for the simplest solution, one with minimal impact to the codebase and that could serve as a starting point for people looking to build these components across orgs (as is the case in Sanger Tree of Life), in a way that no changes are required for the way subworkflows are currently implemented. Thank you for your input and your review!
I see a linting test failing, we should update the test here to add a valid fake version.
This is looking good! But it would be good to add a couple more test cases:
* Installing a subworkflow which installs a module from another repo which is also installed from nf-core, for example fastqc which is already in the pipeline template. * Removing nf-core fastqc to make sure the fastqc module from the other repo is kept. * Installing the subworkflow on an old version (sha) and updating it, make sure that all modules from different repos are updated.And if you can think of other situations that I forgot, let's add them too π
In addition to this PR we will need to add documentation on how to use this. Specifying the format of
meta.yml. We should also modify the JSON schema that we use to lint subworkflows, as the elements ofcomponentsare now required to be strings, but I think this is not something we should add to the nf-core/modules repo, it should work if the non-nf-core-repo add a JSON schema file too. This should also be documented
Hi JΓΊlia, just wanted to let you know that we're still working on this feature! Just hit a few roadblocks that, due to the short time I can allocate to this every week, are taking a bit too long to solve. To sum up:
- The first two test cases you mentioned have been already implemented and are working as intended, without any changes to the code being necessary.
- Although I initially thought the feature wouldn't impact the update cycle, after writing down the test case I could see the feature requires some changes to the update module. And this is what we've been working on the past weeks. We've already gotten to the point where updating works, but there are still a few things to iron out.
Other than that:
-
We're moving the cross-org modules for testing to the nf-core-test org you created, here's the PR: https://github.com/nf-core-test/modules/pull/1. For some reason I wasn't able to request your review on it. The new JSON schema is there as well.
-
We've started writing some documentation for the feature, you can check it out here: https://gist.github.com/jvfe/60d228b093839a72a94cf407d579d854#file-readme-md. We weren't sure whether we should submit this to an nf-core repository as a PR and to which repo as well (website?).
-
Lastly, could you clarify what you meant by adding a valid fake version to the linting test?
Thank you so much for your reviews and your patience while we work on the feature.
Hi @jvfe, that's great! Thanks for implementing the tests.
The documentation can be added to the website. Here you have the docs for the subworkflow install command. And for a more detailed guide like the README you shared, I would add a page in the tutorials, under "external usage" like we do for the configs (see here)
Lastly, could you clarify what you meant by adding a valid fake version to the linting test?
This was fixed by another PR, so if you update your branch with the dev branch the test should pass.
Hi @jvfe, sorry for the delay, I reviewed you PR on the nf-core-test repo. Do you need help with anything else?
Hi @mirpedrol. Again, sorry for the long delay. I managed to implement all three tests you had requested, so I think the PR is ready for review once again. The feature did require changing sections of the update code, as you'll see.
I also changed all test URLs to the new repo/org you created, and also wrote documentation for the feature, available at https://github.com/nf-core/website/pull/2816
Thank you very much for your patience!
Codecov Report
Attention: Patch coverage is 93.28859% with 10 lines in your changes missing coverage. Please review.
Project coverage is 76.86%. Comparing base (
de2c1f7) to head (9c9a2a7). Report is 92 commits behind head on dev.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
π New features to boost your workflow:
- β Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
I am getting to this review again, thanks for adding the tests! I will update the base branch and resolve conflicts and do a final round of testing, and then we can merge if everything works π
I had to open a new PR after updating the branch because I wasn't able to push to this branch, you can see it here https://github.com/nf-core/tools/pull/3456 There is still one last thing missing. When removing a mixed subworkflow, only the components installed from that same repo are removed, the components from cross-repos are not removed.
There is still one last thing missing. When removing a mixed subworkflow, only the components installed from that same repo are removed, the components from cross-repos are not removed.
should be fixed with https://github.com/nf-core/tools/pull/3456/commits/04b26bb029d677eaa732d53ab317b056fc875dbb
I had to open a new PR after updating the branch because I wasn't able to push to this branch, you can see it here #3456 There is still one last thing missing. When removing a mixed subworkflow, only the components installed from that same repo are removed, the components from cross-repos are not removed.
Hi! Just saw your comment, sorry - though you fixed it way quicker than I would have π . Also, just merged your changes to this branch. We can also give you access to this fork if necessary (@muffato I think you'd have to do that, since I don't have the permission).
I've given you, @mirpedrol , write access to repo. I'd prefer we continue the review here rather than #3456 so that all the review history is in one place
Does it also work with nf-core subworkflows patch?
I have tested that it works π patch is independent of the included components so it should not be affected by these changes
As I can see this PR is looking good and probably heading towards a merge I just want to gently raise my previous point
I'm unclear on what level of commitment to maintain and work around issues this introduces there is from an nf-core standpoint.
This PR adds functionality to make cross-repo subworkflows work but we are still lacking a serious CI testing solution and I'm worried about other footguns the functionality potentially opens up. I'm conscious that greater private repo support is something many (including myself!) want but I also want to ensure we achieve it with stability and long-term viability.
@awgymer What CI testing is missing ? This includes does include tests for the new cross-repo functionality. That's in fact how it was developed: @jvfe first wrote (failing) test cases and then implemented the code so that they pass.
@awgymer What CI testing is missing ?
It's a fundamental limitation of this style of subworkflows that you cannot test them easily in their remotes on CI right now and this tools functionality does nothing to address that. Once again my issue is not so much with the actual code of this PR but rather the wider implications this PR has.
Ah right. Well, it sounds like it is a good idea for another internship/mentorship ;) a nf-core command to install missing cross-org modules so that tests can then run.
how about a CI test, where we create a test pipeline and build a mixed subworkflow there?
It's there already: https://github.com/nf-core/tools/pull/3083/files#diff-60cea8a0cf3a284ef26219e6dc9b9b2b55ebb25214cb520b0ae38ef6179436d7R87
It uses modules/subworkflows across nf-core/modules and https://github.com/nf-core-test/modules
Hello! After a chat with the #infrastructure team, we think that this PR can be merged, as the missing CI testing is not affecting the nf-core modules repo. A solution for this can be discussed afterwards. For instance, one option would be to create a pipeline, install the subworkflow, and test it there.
Hi everyone sorry for my absence as of late, but I'm glad to see the feature getting some traction and coming closer to a merge!
A solution for this can be discussed afterwards. For instance, one option would be to create a pipeline, install the subworkflow, and test it there.
This seems like an adequate solution to me, although it'd be better implemented in a separate PR.
The current CI failures seem to be caused by the nf-schema plugin and don't seem related to the feature, so not sure if there's a proper solution from this side.
The current CI failures seem to be caused by the nf-schema plugin and don't seem related to the feature, so not sure if there's a proper solution from this side.
Hi @jvfe, the fail is due to the edge release of Nextflow, the Nextflow team is working on a fix. Let's wait to see if they fix it soon, otherwise, I will consider merging without these tests (as they were passing before). Before the hackathon would be a good timing for people to test it out and find possible bugs
Thank you everyone !
