community icon indicating copy to clipboard operation
community copied to clipboard

REQUEST: Repository maintenance on opentelemetry-collector-contrib

Open mx-psi opened this issue 2 years ago • 22 comments

Affected Repository

https://github.com/open-telemetry/opentelemetry-collector-contrib

Requested changes

Enable the Require merge queue setting on the branch protection rules for the main branch with "squash" merge method

Purpose

The Github documentation linked above puts it succintly:

The merge queue provides the same benefits as the Require branches to be up to date before merging branch protection, but does not require a pull request author to update their pull request branch and wait for status checks to finish before trying to merge.

We have recently had several issues related to 'race conditions' between PRs being merged. To prevent this and ensure a better contributing experience, we want to enable the merge queue.

This was discussed on the January 31st Collector SIG meeting and on this issue https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/30880#issuecomment-1919564897.

Expected Duration

Permanently

Repository Maintainers

  • @open-telemetry/collector-contrib-maintainer

mx-psi avatar Feb 09 '24 10:02 mx-psi

Enabled with the following settings:

image

The Learn more link points here in case you need it: https://docs.github.com/en/repositories/configuring-branches-and-merges-in-your-repository/configuring-pull-request-merges/managing-a-merge-queue#triggering-merge-group-checks-with-github-actions

Please let me know if it works as intended or if the settings would need any tweaking.

arminru avatar Feb 09 '24 10:02 arminru

Thank you Armin! :) Will loop back after we follow the remaining steps and confirm that this works as intended

mx-psi avatar Feb 09 '24 12:02 mx-psi

@arminru we've added merge_group: to all our required checks, but we're not seeing the pattern described in https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/incorporating-changes-from-a-pull-request/merging-a-pull-request-with-a-merge-queue. Any thoughts?

TylerHelmuth avatar Feb 13 '24 17:02 TylerHelmuth

Hi @TylerHelmuth! I configured the main branch rules as per the screenshot above again. On a PR with both sufficient reviews and passing checks I'm now presented with a "Merge when ready" button. I also tried it out in https://github.com/open-telemetry/opentelemetry-collector-contrib/pull/31253#event-11804184349 (and removed it from the group right after before it was actually merged). Please let me know if it works for you now 🤞

arminru avatar Feb 14 '24 16:02 arminru

Once enqueued you should find the pending PR here: https://github.com/open-telemetry/opentelemetry-collector-contrib/queue/main

arminru avatar Feb 14 '24 16:02 arminru

I can see the new merge button name so it seems like it is working :) image

Thanks again Armin

mx-psi avatar Feb 14 '24 16:02 mx-psi

With the merge queue working I'd like to leave this issue open for a bit longer while to observe the process. Things on our mind:

  1. How well does it handle CI flakes
  2. Queue times - we'll not be spinning up jobs for the PR, the merge queue, and main. Because of our matrices that is a lot of jobs. @arminru does the OpenTelemetry org have a max number of concurrent jobs?

TylerHelmuth avatar Feb 14 '24 16:02 TylerHelmuth

I'd need to reach out to our enterprise owners in the LF/CNCF for definitive answer but from what I can tell we're on an Enterprise plan that should come with the limits specified here: https://docs.github.com/en/actions/learn-github-actions/usage-limits-billing-and-administration#usage-limits Not sure if I'd see any negotiated deviations from these plans, if any, in our org settings.

arminru avatar Feb 14 '24 16:02 arminru

A bigger problem:

image

TylerHelmuth avatar Feb 14 '24 16:02 TylerHelmuth

I'd need to reach out to our enterprise owners in the LF/CNCF for definitive answer but from what I can tell we're on an Enterprise plan that should come with the limits specified here: https://docs.github.com/en/actions/learn-github-actions/usage-limits-billing-and-administration#usage-limits Not sure if I'd see any negotiated deviations from these plans, if any, in our org settings.

CNCF negotiated a bump from 180 to 500 concurrent runners about a year ago, but it looks like GitHub may have bumped all Enterprise plans to (minimum) 1000 concurrent runners since then

trask avatar Feb 14 '24 17:02 trask

@arminru besides the EasyCLA issue we are seeing huge queues since the merge queue was enabled. Can you bump Build Concurrency to 100?

TylerHelmuth avatar Feb 14 '24 17:02 TylerHelmuth

Or @trask

TylerHelmuth avatar Feb 14 '24 17:02 TylerHelmuth

If that doesn't help I think we'll need to revert and retry, maybe on a test repository or something smaller like the helm charts.

TylerHelmuth avatar Feb 14 '24 17:02 TylerHelmuth

Actually let's just revert it bc we know EasyCLA will be an issue.

TylerHelmuth avatar Feb 14 '24 17:02 TylerHelmuth

Chatting with @arminru we've disabled merge queue again so Contrib is back to its starting point (and actions are running again).

We'll try again soon with a simpler repository (or maybe even a test repo?) to see what we can do about EasyCLA and not blocking PR actions.

TylerHelmuth avatar Feb 14 '24 17:02 TylerHelmuth

We'll try again soon with a simpler repository (or maybe even a test repo?) to see what we can do about EasyCLA and not blocking PR actions.

It may be worth asking in the #easycla channel, they have been pretty responsive there in the past. And if not, I can open an EasyCLA ticket with CNCF.

trask avatar Mar 27 '24 00:03 trask

I can help move this forward in the helm chart repo or a test repo (if someone makes it) if someone wants to figure out the easycla stuff

TylerHelmuth avatar Mar 27 '24 01:03 TylerHelmuth