automation improvement - workflow templates and github action definitions
During today's TDC call #4220 @handrews brought up the fact that we might have misalignment and duplication of workflows and scripts across repositories. It of course make sense to rationalize that to improve reliability and reduce maintenance costs. I promised during the call I'd recap my suggestions on an issue, here are two ways we could rationalize that:
Duplicated scripts should really be GitHub actions
GitHub actions can be defined to wrap up scripts and their runtime environment. Not only that enables easy reusability across multiple repos, it also makes consuming workflows simpler since they don't need to setup the runtime for the script (node, powershell, python, etc...)
Here is an example of a custom action that comments on pull requests based on an input file it parses. Note that it contains multiple things:
- a dockerfile to setup what it needs.
- an action.yaml that gives metadata about the action itself, inputs, outputs
- the PowerShell script in this case (could really be anything)
- (not present) a readme to document how to consume the action itself.
This action can now be consumed like any other in a workflow example
You can have multiple action definitions or a single one per repo, it's really up to you. And you can rely on workflows to test your action logic, making it more reliable. Additionally, actions are versioned using git tags, and upgrades can be automated via dependabot. Another benefit of this approach is to be able to run the script locally by just running the image, no additional requirements beyond docker. One downside of using dockerfile is that now the container needs to be built for every execution of the consuming workflow. This is easily fixed by adding another workflow where the action definition builds the image and publishes it to GitHub artifacts, and updating the "image" field in the actions.yaml.
Reusable workflows
I think this documentation page is great as an introduction. The goal would be to have a central repo for all the reusable workflows, and refer to them in target repositories.
Additional solutions
- composite actions in case we often repeat the same X steps across workflows.
- workflow templates in case we want to enable quick setup of new repositories.
I hope this provides good context for discussions. Let me know if you have any additional comments or questions.
Looking at the three incarnations of the respec workflow in the Arazzo-, OpenAPI-, and Overlay-Specification repos, the common structure is
- Check out current branch
- Check out deployment target branch into subfolder
- Run HTML build script to modify subfolder
- Create pull request for deployment target branch from modified subfolder
Would we start with an action for step 3 to get rid of the three slightly different copies of the HTML build scripts?
Or would we want to have a (composite?) action that performs all four steps so that the three reduced workflows would only differ in the value of a "spec" action parameter?
The same could be used for the schema-publish workflow in the OpenAPI- and Overlay-Specification repos.
The schema-tests and validate-markdown workflows are trivial and I don't see the need for an action there.
I'd start with making 3 an action, it's going to be atomic and easier to test. You can always make a composite one after you've done so.
Step 3 looks to be the same in all three repos on the surface:
- name: run main script
run: scripts/md2html/build.sh
Inside the repo-specific build.sh the main difference is how the list of editors is constructed per specification version, with the most complicated flavor in the OAS repo:
https://github.com/OAI/OpenAPI-Specification/blob/ba75c2949fc475367d80a1178fd9a71a299e2375/scripts/md2html/build.sh#L13-L37
This could be harmonized by materializing the editors lists as additional files versions/3.1.1-editors.md etc.
Then the build script can become repo-agnostic.
@OAI/tsc: any concerns?
@ralfhandl makes sense to me. The whole thing with MAINTAINERS and EDITORS has become very convoluted. Just saving the correct snapshot per version seems much easier to understand, and then we don't have multiple overlapping files because the editors and TSC aren't quite the same set of people.
I'd like to suggest that we factor the infrastructure out into its own repo and submodule it in to the various repositories and branches that need it. The number of auto-merges for infrastructure things is getting pretty ridiculous (although I know that is partially an artifact of start-up). But the core logic doesn't change. A submoduled repository is a better way to keep things synchronized.
Fully agree, just wanted to get it up and running before adding additional complexity.
Then factor it out and reuse it for Overlay and Arazzo.