tools-iuc
tools-iuc copied to clipboard
Automatic tool requirement updates
We've been working recently in Freiburg on a GitHub bot to automate updating Galaxy tool requirements - the idea is pretty similar to @BiocondaBot which makes a lot of useful updates automatically to bioconda recipes. We have now deployed on the galaxycompchem tools repo and @bgruening's galaxytools repo and would eventually like to also deploy it on tools-iuc, but obviously that's a much bigger step which affects many more people, hence the creation of this issue.
First of all, here are a couple of example PRs to demonstrate what the bot is meant to be doing:
- https://github.com/galaxycomputationalchemistry/galaxy-tools-compchem/pull/109/
- https://github.com/bgruening/galaxytools/pull/1085
And here is an example of what kind of changes to the tools-iuc repo would be proposed by the bot: https://github.com/simonbray/tools-iuc/commit/99a2ec005152e7f0a5617c05c6e1a0e692783b50
In short, the bot is run by a regular (currently weekly) CI job and creates a new PR for each tool repo found by planemo ci_find_repos
in the GitHub repo. It requires a @TOOL_VERSION@
token to be present in the tool wrapper and used twice, once in the tool version and once in the requirements section. The assumption is made that the requirement with this token is the 'main requirement' (as it has the same value as the tool version itself), and if this requirement is out-of-date (i.e. a newer version exists in bioconda or conda-forge) all requirements are updated. The bot subsequently also automatically updates test data if necessary, see for example this commit: https://github.com/galaxycomputationalchemistry/galaxy-tools-compchem/pull/110/commits/e20fae09022827e047f293bc15823980898a9f34. As for @BiocondaBot, merging has to be done by a human.
The code is in a branch of planemo currently, implemented as a planemo autoupdate
subcommand - there is an open PR here: https://github.com/galaxyproject/planemo/pull/1065 (any comments or suggestions are welcome). It is basically just a python script for parsing XML. There are some complexities though, particularly for dealing with macros and with wrapper versioning (the latter was discussed a bit already in https://github.com/galaxy-iuc/standards/pull/59). Obviously a lot of things can be customised, e.g. lists of tools and/or requirements to skip updates for, which conda channels to check for, whether to update the test data as part of the update - the PR contains some brief documentation.
The bot itself is run by GitHub actions which are defined here: https://github.com/planemo-autoupdate/autoupdate/
As a next step (not really relevant here, but maybe interesting) we would like to implement a similar autoupdate bot for Galaxy workflows. Then we would have full automation of updating from source code release > conda package release > Galaxy tool > Galaxy workflow, with automated testing at each step.
Anyway, this was quite a lot of information, any questions / feedback / suggestions / criticism would be great. :)
Cool!
Some ideas:
- I think it might be good to integrate it in https://github.com/galaxyproject/planemo-ci-action
- Maybe it would be better to setup the workflow in the repo itself?
- I think updating test data might be implemented by a slash command, then we can trigger it from a PR
- Is it possible to skip autoupdate, e.g. if we close an "auto PR" (maybe the update is to minor or irrelevant) that the bot does reopen a new one each week?
Sounds like a great idea. The implementation and features look well enough that possible fear of adding a lot of noise to the repo seems like a non-issue.
What is the logic for the test data updating? Does it just update the test results, inherently trusting the test run execution? I do not really understand the linked test -data example, since e.g. file tools/gromacs/test-data/newbox.pdb
does not seem to be used in the tests for any of the altered wrappers?
@bernt-matthias @martenson thanks a lot for the comments
I think it might be good to integrate it in https://github.com/galaxyproject/planemo-ci-action Maybe it would be better to setup the workflow in the repo itself?
+1, the only (potential) disadvantage for the second part is that you have to store the github access credentials for the bot separately in each repo.
I think updating test data might be implemented by a slash command, then we can trigger it from a PR
slash command = something like @planemo-autoupdate update data
? Yes, very nice idea, then it can even be used for non-bot-created PRs as well.
Is it possible to skip autoupdate, e.g. if we close an "auto PR" (maybe the update is to minor or irrelevant) that the bot does reopen a new one each week?
I followed BiocondaBot quite closely for the implementation, so if you close a PR, it will be reopened the next week. Obviously this can be viewed as a bug, the question is just what the alternative is. On bioconda they use a do not close
tag for unwanted PRs. There is also the option to disable updates more permanently by adding the tool to the skip list, e.g. https://github.com/planemo-autoupdate/autoupdate/blob/main/bgruening_skip_list.
adding a lot of noise to the repo
There will be quite a lot of noise when it's first implemented, but after that I think it should be quite manageable.
What is the logic for the test data updating? Does it just update the test results, inherently trusting the test run execution?
Yes, it effectively runs planemo test --update_test_data
after the autoupdate.
tools/gromacs/test-data/newbox.pdb
does not seem to be used in the tests for any of the altered wrappers?
You have to bear in mind that a macro is being updated - the tool in question is gmx_setup
. I have to admit the example is quite artificial (I deleted some lines from the test data myself and pushed to the branch, just to check this is working).
What is the logic for the test data updating? Does it just update the test results, inherently trusting the test run execution?
Yes, it effectively runs
planemo test --update_test_data
after the autoupdate.
Should we do that only if a first planemo test
fails?
What is the logic for the test data updating? Does it just update the test results, inherently trusting the test run execution?
Yes, it effectively runs
planemo test --update_test_data
after the autoupdate.Should we do that only if a first
planemo test
fails?
I modified the update_test_data behaviour a bit in a separate PR, because I found it quite confusing: https://github.com/galaxyproject/planemo/pull/1079. So it now only updates the test data if the initial test fails.
Is it possible to skip autoupdate, e.g. if we close an "auto PR" (maybe the update is to minor or irrelevant) that the bot does reopen a new one each week?
I followed BiocondaBot quite closely for the implementation, so if you close a PR, it will be reopened the next week. Obviously this can be viewed as a bug, the question is just what the alternative is. On bioconda they use a do not merge tag for unwanted PRs. There is also the option to disable updates more permanently by adding the tool to the skip list, e.g. https://github.com/planemo-autoupdate/autoupdate/blob/main/bgruening_skip_list.
As it was maybe not clear: the bot will push to an existing branch if possible rather than making a new PR. So if you decide you don't care about version 1.1, leaving the PR open has the advantage that the bot doesn't bother you again until it pushes version 1.2, without requiring that the autoupdate is completely disabled for that tool.
the only (potential) disadvantage for the second part is that you have to store the github access credentials for the bot separately in each repo.
I would see this as an advantage :)
slash command = something like @planemo-autoupdate update data? Yes, very nice idea, then it can even be used for non-bot-created PRs as well.
exactly, we do this already for triggering the ci workflow:
https://github.com/galaxyproject/tools-iuc/blob/master/.github/workflows/slash.yaml https://github.com/galaxyproject/tools-iuc/blob/55de003830db380639df997cdc5810df8022f63e/.github/workflows/ci.yaml#L6
How about adding tools to this precedure successively by adding them to a allowlist file. Maybe start with the most frequently used tools on usegalaxy.*. Once the majority of the tools is in we can remove the allowlist and apply it to all tools.
How about adding tools to this precedure successively by adding them to a allowlist file. Maybe start with the most frequently used tools on usegalaxy.*. Once the majority of the tools is in we can remove the allowlist and apply it to all tools.
What are your concerns here? I think we can be pragmatic here and enable it ... see how it goes. Every PR is good to see to be notified there are potential updates and also good for every paper-cut day.
Probably you are right. My concern was that we are flooded with PRs. Maybe it would be good to know at first how many PR would be created.
@simonbray could you enable this one but restrict the PRs per week to 10 for the time being?
Thanks!
The bot has started opening some PRs, e.g. https://github.com/galaxyproject/tools-iuc/pull/3717
One thing which could be nice would be to automatically add a tool-autoupdate
label to the PRs. I think the bot account would need triage permissions for this?
Cool @simonbray. MITOS is already a difficult case since we maintain 1.x and 2.x .. I guess similar things also happen for other tools, e.g. stacks.
MITOS is already a difficult case since we maintain 1.x and 2.x .. I guess similar things also happen for other tools, e.g. stacks.
Okay, good to know. Then I guess it just needs to go on the skiplist (or at least, the 1.x version).
Nice! I haven't looked at how this works, but would it be possible to have the new version number in the PR title? Of the main requirement, if multiple are updated.
One thing which could be nice would be to automatically add a tool-autoupdate label to the PRs. I think the bot account would need triage permissions for this?
How about moving the workflow to this repo? This might simplify the permission problem, or?
Nice! I haven't looked at how this works, but would it be possible to have the new version number in the PR title? Of the main requirement, if multiple are updated.
Yes, it's possible, it would require a bit of an update to the Planemo code though.
One thing which could be nice would be to automatically add a tool-autoupdate label to the PRs. I think the bot account would need triage permissions for this?
How about moving the workflow to this repo? This might simplify the permission problem, or?
Probably - I've never experimented with opening PRs directly from GH actions.
Would it make sense to prioritize auto-updates with small version jumps? Maybe judged by the difference in the version numbers or release dates...
Rationale would be that larger jumps are more likely to need human interaction more often. Smaller jumps are (potentially) easier to review and we get a higher throughput of autoupdates by this. Thereby we might lower the likelihood of larger version jumps....
Smaller jumps are (potentially) easier to review and we get a higher throughput of auto-updates by this.
We could also do not limit the updates and let the bot do its thing with a max throughput :) Once a PR is open and not merged it gets automatic updates with every new version that is released, see https://github.com/galaxyproject/tools-iuc/pull/3803
Thereby we might lower the likelihood of larger version jumps....
That one I do not follow. If parameter changes we need a human in the loop, no matter if it's small or big.
The autoupdater should not have closed https://github.com/galaxyproject/tools-iuc/pull/3721
@simonbray that was probably unintended, @simonbray was working on the bot.
Yes, it was not intended, I was trying to fix something else.
Would it make sense to prioritize auto-updates with small version jumps? Maybe judged by the difference in the version numbers or release dates...
I think I tend to agree with Björn here, Or at least, there are a lot of other things that could be improved first, I would have that further down the priority list. :)
would it be possible to have the new version number in the PR title?
This has been implemented.
There was some discussion recently about unwanted autoupdate PRs (ping @bernt-matthias @abretaud @bgruening @wm75 who all commented in #4402, #4523, #4496). Until now the bot was following the same approach as for BiocondaBot, which was followed quite closely for the implementation, i.e. closed PRs get reopened immediately. Bioconda are dealing with this situation using the do-not-close tag to mark unwanted PRs: https://github.com/bioconda/bioconda-recipes/pulls?q=is%3Aopen+is%3Apr+label%3Ado-not-close
Since this seems unsatisfactory I now modified the bot's behaviour so that it will not open a new PR if an identical unmerged, closed PR already exists.
Is this ok? Or does anyone feel that all PRs should be merged, so there is a Galaxy version for every release, even if releases are functionally identical?
Btw: for future discussion of the bot's behaviour, maybe we can switch to this issue and keep the individual PRs for discussion of the tools themselves? Otherwise it gets a bit confusing to connect the different comments.
@simonbray sounds good to me.
We could also discuss at the bot's repo...