opensearch-build icon indicating copy to clipboard operation
opensearch-build copied to clipboard

[PROPOSAL] Automate merging of branch version bump PRs

Open dbwiddis opened this issue 1 year ago β€’ 21 comments

What/Why

What are you proposing?

We should enable repo/admin-level automation to merging branch version bump PRs because maintainers aren't doing it.

What users have asked for this feature?

I requested it in the 2.14.0 retrospective.

I have repeatedly nagged other plugin maintainers to merge upstream dependencies so I can close mine.

I have received pushback from other plugin maintainers for the above nagging, blocking my efforts to actually close open PRs on repos that I maintain.

Other mentions of automation or enforcement of version bumping:

  • https://github.com/opensearch-project/opensearch-build/issues/2706
  • https://github.com/opensearch-project/opensearch-build/issues/2949

What problems are you trying to solve?

This, on a repo I maintain (source). Screenshot 2024-08-08 at 10 46 48β€―PM

And this on an upstream dependency, blocking me from merging mine (source). Screenshot 2024-08-08 at 10 47 20β€―PM

And this on another upstream dependency, blocking me from merging mine (source). Screenshot 2024-08-08 at 10 49 17β€―PM

Organization-wide, opensearch-project has 204 unmerged version bump PRs.

These are literally mouse clicks away from being closed, but it takes upstream repos to lead the way.

What is the developer experience going to be?

For maintainers like me, the ability to merge PRs on their repo because upstream repositories have appropriate versioning.

For maintainers who don't care about these PRs, they won't have to lift a finger. How awesome is that?

More seriously, minor version bump PRs are assumed to have been cut as part of automation (Autosync) and multiple developers spent multiple hours dealing with the aftermath of a branch cut prior to a version bump in the 2.14 release. That's at least an hour of my time wasted, multiplied by at least 4 other developers.

Are there any security considerations?

Patch version bumps are one of the biggest culprits here, because patch releases rarely happen.

However, when they do, it's usually for a very important issue that can't wait until the next release cycle. In this case, having plugins wait for multiple upstream dependencies to merge their patch version bumps could slow our ability to react to these.

Are there any breaking changes to the API

Nope.

What is the user experience going to be?

Seeing plugin repositories that are well-maintained without a huge backlog of ignored PRs that discourage them from contributing.

Are there breaking changes to the User Experience?

Nope.

Why should it be built? Any reason not to?

It should be built because automation already exists to create the PRs; they can be auto-merged by a bot with appropriate powers to do so, when all dependencies are met.

It should be built to save the time and effort of maintainers to do the same. In particular, GitHub action retries expire after 30 days, requiring a minute or so of effort to re-try a version bump PR after a month of upstream repos ignoring it... or similar effort to search and identify whether it's able to be merged. It's a distraction and complete waste of developer time.

What will it take to execute?

Whatever automation creates the PRs can be given the power to merge them if CI checks are gren.

Any remaining open questions?

Why don't we just require maintainers to do this? Actually, we do, in Release Checklists, which specify merging these version bumps as part of the checklist. There are 453 open Release Checklist issues and while 3.0.0 and 2.17.0 are a couple hundred of those, there are far more than that.

Acceptance Criteria

  • Auto increment version PRs should be merged as soon as they have a passing CI and maintainer approval.
  • If upstream (core) as well as depend plugins such as common-utils, job-scheduler have their PRs merged, re-run all other plugins' CI daily.
  • Remove any duplicate version increment workflows from repos to avoid confusion and false negatives.

dbwiddis avatar Aug 09 '24 06:08 dbwiddis

I notice that many of these do not have passing CI :(

dblock avatar Aug 09 '24 10:08 dblock

The version increment workflow is https://github.com/opensearch-project/opensearch-build/blob/main/.github/workflows/os-increment-plugin-versions.yml and https://github.com/opensearch-project/opensearch-build/blob/main/.github/workflows/osd-increment-plugin-versions.yml, moving this to opensearch-build.

dblock avatar Aug 09 '24 11:08 dblock

I notice that many of these do not have passing CI :(

for many that is because an upstream dependency hadn’t bumped when CI first ran. And they don’t push/retry.

dbwiddis avatar Aug 09 '24 14:08 dbwiddis

Auto-merge workflow that would automatically merge these PRs if the CI checks pass: https://github.com/opensearch-project/opensearch-build/blob/main/.github/workflows/automatic-merges.yml

gaiksaya avatar Aug 15 '24 16:08 gaiksaya

Linking some approaches to rerun failing CIs. https://github.com/opensearch-project/opensearch-build/issues/2706

zelinh avatar Aug 19 '24 22:08 zelinh

Following are my thoughts on having the version increment PR's auto merged.

  • Coming from https://github.com/opensearch-project/opensearch-build/issues/4225 Automatically run a min distribution build when a version increment is merged into OpenSearch core.

  • Start by having a discussion on update the code freeze for core earlier than the plugins. This way the code freeze for core is honored and everything is ready before plugin can finalize the changes. This should eliminate the last minute breaking changes and the plugins can use the finalized core artifact for version increment CI's.

  • Today we have an automation that creates the version increment PR's. Its upto the plugin teams to take care of the CI's related to the version increment PR's and get the PR's merged, a release manager will follow up until the PR's are merged (This needs to be automated).

  • For the automation we can have a workflow to ensure the plugin dependencies are build first:

    • Create a dependency tree which has information about the plugin dependencies.
    • For every plugin core is the dependency, the previous step https://github.com/opensearch-project/opensearch-build/issues/4225 solves the problem.
    • Now get the version increment PR's of the default plugins like job-scheduler, common-utils, security and ensure they are auto-merged once the CI's are passed. These default plugins uses core as dependency which is solved in the previous step.
    • Next, for the other plugins identified in the dependency tree, obtain the version increment pull requests for the dependent plugins, auto-merge them, and start building the artifacts. Once the dependent plugins are clean, retry the plugin CI processes, ensure they pass, and then merge the version increment pull requests.

A rough workflow explaining the above points:

             Create Dependency Tree
                      ↓
              Core Dependency Solved (Issue 4225)
                      ↓
   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
   ↓                                          ↓
Default Plugins                              Other Plugins
(Dependent only on core)                     (Dependent on core + upstream plugins)
   ↓                                          ↓
Fetch Version Increment PRs                Check Dependencies
   ↓                                          ↓
Run CI                                     Fetch Version Increment PRs for the Dependencies
   ↓                                          ↓
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                          Ensure CI Passes and Build Dependencies (if needed)
↓               ↓                                  ↓
CI Passed    CI Failed                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
   ↓               ↓                    ↓                 ↓
Auto-Merge PRs   Investigate & Fix     CI Passed       CI Failed
                    ↓                   ↓               ↓
                 Retry CI            Auto-Merge     Investigate & Fix β†’ Retry CI β†’ CI Passed β†’ Auto-Merge PRs
                    ↓                   PRs             ↓
                 CI Passed              ↓               ↓
               Auto-Merge PRs       Once merged, build the artifacts
                                           ↓
                                  Re-run the (current) plugin CI's since Dependencies are merged and built
                                           ↓
                                      β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                                      ↓               ↓
                                    CI Passed       CI Failed
                                      ↓               ↓
                                  Auto-Merge       Investigate & Fix
                                      ↓               ↓
                                     End           Retry CI
                                                    ↓
                                                 CI Passed
                                                    ↓
                                             Auto-Merge PRs
                                                    ↓
                                                   End

Adding @gaiksaya @dblock @getsaurabh02 @dbwiddis @peterzhuamazon

prudhvigodithi avatar Sep 09 '24 20:09 prudhvigodithi

Thanks @prudhvigodithi for this detailed graph.

The automation app should become handy when trying to compose all the information and decision in one place, while access to all the repos at once with admin permissions.

We can use the input manifest to find the dependency tree between plugins, and decide on which plugins to take care before others. If a plugin is a dependency of the other, and it failed the checks, it should be hard block before moving to the next plugin.

We can then use metrics cluster and release dashboards to know where we are and even compose a dependency tree as a pre-requisite as the entry criteria.

Please let me know what you think.

Thanks.

peterzhuamazon avatar Sep 09 '24 20:09 peterzhuamazon

All we need are passing CIs. Something like this https://github.com/opensearch-project/opensearch-migrations/pull/940#issuecomment-2338884798 from bot's perspective for hard merges (which we should not) or have any auto-merge workflow added like commented above.

For CIs that have expired run (I believe after 30 days) opening and closing the PR should help.

gaiksaya avatar Sep 09 '24 20:09 gaiksaya

Thanks @gaiksaya and @peterzhuamazon for your inputs.

@gaiksaya for this All we need are passing CIs. we need the dependencies to be build 1st and for this we need the dependency version increment's to be merged before we can build the dependencies, once we have this yes I'm fine with the flow on re-trying and merging. What @peterzhuamazon added is also a good idea to leverage bot and metrics cluster if it can make the automation easy :).

prudhvigodithi avatar Sep 09 '24 20:09 prudhvigodithi

Thanks Both.

I feel like github actions is suited for individual workflow but not really suited to combined actions among multiple repos. Especially when we already have so much plugins, it is not easy to update all the repos if there is any changes, or need a centralized call on what to proceed next.

Therefore I was raising the use of automation app combined with metrics cluster to do the hard work and easier to maintain over time.

Thanks.

peterzhuamazon avatar Sep 09 '24 20:09 peterzhuamazon

I don't believe app would have the permission to re-run CIs. @prudhvigodithi I agree we need dependent components to build first. But opening and closing the PR once a day till they get merged is one of the solution to re-run the CIs. If the dependent components are ready by then, great! Else try again tomorrow.

gaiksaya avatar Sep 21 '24 00:09 gaiksaya

I don't believe app would have the permission to re-run CIs. @prudhvigodithi I agree we need dependent components to build first. But opening and closing the PR once a day till they get merged is one of the solution to re-run the CIs. If the dependent components are ready by then, great! Else try again tomorrow.

The app does have full access to re-run and trigger workflows: Workflows, workflow runs and artifacts

peterzhuamazon avatar Sep 23 '24 18:09 peterzhuamazon

Thank you @dbwiddis for the proposal.

Hi @peterzhuamazon , @gaiksaya , @prudhvigodithi , @getsaurabh02: How should we proceed with this forward? We did see with 2.18 version bump as well where until and unless upstream have taken care of version bump, dependency plugins are unable to do theirs.

minalsha avatar Sep 25 '24 22:09 minalsha

Hi @peterzhuamazon Does it make sense to move this issue to https://github.com/opensearch-project/automation-app ? We can start with merging PRs that have passing CIs and then iterate on the logic for re-running the CI's timely, etc. WDYT?

gaiksaya avatar Nov 14 '24 00:11 gaiksaya

Hi @peterzhuamazon Does it make sense to move this issue to https://github.com/opensearch-project/automation-app ? We can start with merging PRs that have passing CIs and then iterate on the logic for re-running the CI's timely, etc. WDYT?

I think it does make sense tho we need to understand the scale of the problem. We might need to create an additional app just to handle all PR related activities.

Thanks.

peterzhuamazon avatar Nov 14 '24 21:11 peterzhuamazon

Hi @peterzhuamazon Does it make sense to move this issue to https://github.com/opensearch-project/automation-app ? We can start with merging PRs that have passing CIs and then iterate on the logic for re-running the CI's timely, etc. WDYT?

I think it does make sense tho we need to understand the scale of the problem. We might need to create an additional app just to handle all PR related activities.

Thanks.

Thanks! We can use this issue as a problem statement for the same. Coming from https://github.com/opensearch-project/opensearch-build/issues/5171 auto-merging of the version bump PRs also applies to the core repos. See this comment. Instead of implementing an individual solution for both core repos, I think having a generic one would be great. Moving this issue to automation-app repo to discuss the in and out of scope requirements, approach, and implementation of the same.

gaiksaya avatar Nov 14 '24 21:11 gaiksaya

[Catch All Triage - 1, 2, 3, 4, 5]

dblock avatar Nov 18 '24 17:11 dblock

I was experimenting with GitHub's auto-merge feature https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/incorporating-changes-from-a-pull-request/automatically-merging-a-pull-request It looks like a good solution.

Please note, enabling auto-merge does not mean all the pull requests will merged as soon as the CI is green. It's per pull request based.

You can allow setting pull requests to merge automatically once all required reviews and status checks have passed.

Below are the 2 approaches:

1. Use same workflow to enable auto-merge

The same workflow that creates the pull request can enable the auto merge feature too. https://cli.github.com/manual/gh_pr_merge

Command to use:

gh pr merge <url>/<number> --auto --squash

Action items to follow:

  1. Enable auto-merge in the repo settings that allows the maintainer or users with write access to enable auto-merge for PR.
  2. Add the above command to the workflow to enable auto-merge

Pros:

  • No need to maintain another workflow
  • Tightly coupled with existing automation
  • Minimal change

Cons:

  • Blast radius increases. If the workflow has issues (for example, some action is failing) the overall workflow will be failing
  • Tightly couples version increment and PR merge
  • Need another solution to manage dependency tree and run CIs if PRs are stuck in failed CI state even though upstream plugins are good.

2. Use metrics data to enable auto-merge for all PRs

We also indexing all this data into opensearch metrics cluster https://metrics.opensearch.org Another approach is to fetch the data and run similar gh commands for those.

Action items to follow:

  1. Enable auto-merge in the repo settings that allows the maintainer or users with write access to enable auto-merge for PR.
  2. Create a jenkins library to fetch data and process it
  3. Add the above command to the workflow to enable auto-merge via jenkins workflow

Pros:

  • Separate workflows so no overlap in the blast radius
  • Both actions (creating a pull request and merging the existing ones) can be managed independently
  • Can be expanded to check the existence of mandatory plugins such as job-scheduler, common-utils, remote-cdk, etc.

Cons:

  • Additional management on jenkins workflow/libraries
  • Depends heavily on metrics data. Missing data = no action

Need your opinion and feedback on this please @prudhvigodithi @dbwiddis @peterzhuamazon @rishabh6788 @getsaurabh02

Thanks!

gaiksaya avatar Apr 29 '25 01:04 gaiksaya

Need your opinion and feedback on this please

I like both options!

Option 2 seems a more elegant solution but I'm not sure this really needs such a high tech solution. Most of the problems could probably be solved with a daily run that merges those with passing CI (or as you indicate, auto-merges) and rebases/retries everything else. Even multi-layered dependency chains would typically get resolved within a week or so with this strategy.

The key here is recognizing which repo(s) are the blockers if the process ends up stalled for some reason, and require manual intervention, and triggering appropriate alerts to that repo's maintainers.

Either way I'm happy to see progress toward some solution other than waiting on maintainers!

Dan

dbwiddis avatar Apr 29 '25 03:04 dbwiddis

I would agree with Dan here that second option is more accurate and related to each repo's state based on dependency tree. Plus the trigger can be optimized based on data.

Thanks.

peterzhuamazon avatar Apr 29 '25 04:04 peterzhuamazon

Thank you for your inputs @peterzhuamazon @dbwiddis Will add some acceptance criteria to this issue to take it to completion.

gaiksaya avatar May 01 '25 19:05 gaiksaya