community icon indicating copy to clipboard operation
community copied to clipboard

REQUEST: Repository maintenance on `opentelemetry-rust`

Open scottgerring opened this issue 10 months ago • 22 comments

Affected Repository

https://github.com/open-telemetry/opentelemetry-rust

Requested changes

Please grant access of https://github.com/organizations/open-telemetry/settings/actions/runner-groups/8 (Benchmark Bare Metal Runners) to this repository, as was done for opentelemetry-go in #2266. We're running our benchmarks in the regular pools and the results are slow and variable.

Related: open-telemetry/opentelemetry-rust#2706

Purpose

Enable bare metal runners to run benchmarks.

Runner group name: 'Benchmark Bare Metal Runners'

Expected Duration

permanently

Repository Maintainers

* [@open-telemetry/rust-maintainers](https://github.com/orgs/open-telemetry/teams/rust-maintainers)

Adding @cijothomas for visibility

scottgerring avatar Mar 14 '25 13:03 scottgerring

done! please check and close the issue once you've confirmed it's working, thanks

trask avatar Mar 14 '25 15:03 trask

@trask thanks! I'll test it out in the next couple days and loop back to you.

scottgerring avatar Mar 17 '25 11:03 scottgerring

Hey @trask , our job doesn't seem to be picked up by the runners. I'm not sure if there's some other configuration we need - I think I saw something, somewhere, about whitelisting particular workflows?

Here's a blocked run and here's the config - but really I've just lifted the runs-on from opentelemetry-go:

jobs:
  runBenchmark:
    name: run benchmark
    permissions:
      pull-requests: write
    runs-on: self-hosted

scottgerring avatar Mar 17 '25 16:03 scottgerring

I think I saw something, somewhere, about whitelisting particular workflows?

oh, you're right!

here's the current allowed workflows:

open-telemetry/opentelemetry-collector-contrib/.github/workflows/load-tests.yml@refs/heads/main, open-telemetry/opentelemetry-go/.github/workflows/benchmark.yml@refs/heads/main, open-telemetry/opentelemetry-java/.github/workflows/benchmark.yml@refs/heads/main, open-telemetry/opentelemetry-java/.github/workflows/benchmark-tags.yml@refs/heads/main, open-telemetry/opentelemetry-js/.github/workflows/benchmark.yml@refs/heads/main, open-telemetry/opentelemetry-python/.github/workflows/benchmarks.yml@refs/heads/main

it looks like we're only running these on main in order to conserve the limited bare metal runners, can you do something similar in rust repo?

trask avatar Mar 17 '25 16:03 trask

We're new to this in Rust-land, so this might not be the best idea, but we've made it contingent on setting a label on the PR so it runs:

https://github.com/open-telemetry/opentelemetry-rust/blob/18e87185531c723dc76bd0dd404889535b22b10e/.github/workflows/pr_criterion.yaml#L9 - this way we can look at a particular PR and say "what's the impact of this thing we're suspicious of on performance", through the label on, and opt into the benchmarks:

    if: ${{ contains(github.event.pull_request.labels.*.name, 'performance') }}

Do you think we could get away with a wildcard to support this, or are we going a bit off the rails?

scottgerring avatar Mar 17 '25 16:03 scottgerring

hm, it looks like the entries in that list "must be pinned to a ref, tag, or full SHA".

so I could add

open-telemetry/opentelemetry-rust/.github/workflows/pr_criterion.yaml@refs/pull/2813/merge

but not something that matches any PR ref branch

I'm guessing the concern with removing this restriction is that any PR could add a workflow that uses the bare metal runner

cc @jpkrohling @open-telemetry/technical-committee as admin for the bare metal runners in case they have some background on this

trask avatar Mar 17 '25 16:03 trask

Hey @trask thanks for the info!

I'll leave it using shared workers for our opt-in PR perf regressions then, and setup the job to do HEAD vs HEAD~1 perf diff on main using this pool for a "higher quality comparison".

I figure this way we get the benefit of the bare metal runner for high-fidelity testing on main and follow the same pattern as the other projects.

What do you think?

I'll see if I can structure this within the same workflow file nicely, or not, and ping back on this thread with the full reference to add to the entries list 💪

scottgerring avatar Mar 18 '25 07:03 scottgerring

Hey @trask , thanks for the patience! Can you bless this job ?

The workflow file is open-telemetry/opentelemetry-rust/.github/workflows/benchmark.yml and it runs on pushes to main

https://github.com/open-telemetry/opentelemetry-rust/actions/runs/13948851079/job/39042702985

scottgerring avatar Mar 19 '25 14:03 scottgerring

done, give it another try

trask avatar Mar 19 '25 14:03 trask

it runs! But, I don't have a good idea what's in the env - we're missing e.g. unzip -> https://github.com/open-telemetry/opentelemetry-rust/actions/runs/13950335436/job/39047893019

I'm not sure what is the easier path forward. If it's easy enough to add things to the env - e.g. unzip - maybe we can start there? But it's hard to say how deep the rabbit hole will go.

Alternatively, i've made a branch that you could explicitly bless, and I can then use that to fix the build, then notify you i'm done, then un-bless it. Hacking on it it in main is pretty undesirable.

https://github.com/scottgerring/opentelemetry-rust/tree/chore/fix-ci-environment-deps

Happy to follow your lead here, and apologies this is fiddlier than I expected!

scottgerring avatar Mar 19 '25 15:03 scottgerring

Alternatively, i've made a branch that you could explicitly bless, and I can then use that to fix the build, then notify you i'm done, then un-bless it. Hacking on it it in main is pretty undesirable.

makes sense

I'm not sure if it will work to bless the branch in your repo, can you push that branch to the otel repo? or open a PR from that branch and I should be able to bless the PR ref tag, e.g. open-telemetry/opentelemetry-rust/.github/workflows/benchmark.yaml@refs/pull/2813/merge

trask avatar Mar 19 '25 16:03 trask

@trask sure ! Here you go --> https://github.com/open-telemetry/opentelemetry-rust/pull/2840

I'll ping you back here when it's finally all done and dusted

scottgerring avatar Mar 20 '25 06:03 scottgerring

And here's a build on the test branch's PR that should get picked up, including the ref:

https://github.com/open-telemetry/opentelemetry-rust/actions/runs/13986932236/job/39162219384?pr=2840

Job defined at: open-telemetry/opentelemetry-rust/.github/workflows/benchmark.yml@refs/pull/2840/merge 

scottgerring avatar Mar 21 '25 07:03 scottgerring

@scottgerring sorry was out for a few days, just tried this and looks like it needs to be a real repo branch:

Provided ref "refs/pull/2840/merge" is ambiguous for workflow "open-telemetry/opentelemetry-rust/.github/workflows/benchmark.yml@refs/pull/2840/merge". Please clarify whether the ref refers to a branch or a tag, e.g. "refs/heads/main" or "refs/tags/main".

trask avatar Mar 27 '25 14:03 trask

@scottgerring sorry was out for a few days, just tried this and looks like it needs to be a real repo branch:

Provided ref "refs/pull/2840/merge" is ambiguous for workflow "open-telemetry/opentelemetry-rust/.github/workflows/benchmark.yml@refs/pull/2840/merge". Please clarify whether the ref refers to a branch or a tag, e.g. "refs/heads/main" or "refs/tags/main".

Hey @trask no worries!

So i've got this branch now and associated PR:

https://github.com/open-telemetry/opentelemetry-rust/tree/scottgerring/benchmark-fix

https://github.com/open-telemetry/opentelemetry-rust/pull/2874

.... but the branch has protection on it, so that I can create it, but not push to it or delete it once its created. Which will naturally mean a lot of back-and-forth with you to get the thing going as I'm unlikely to get it the first CI run 😱

Would it be possible to remove the protection from that branch so I can push changes to it, and bless it for the workers?

(Tagging @cijothomas here for visibility, particularly as I'm asking to remove a branch protection)

scottgerring avatar Mar 28 '25 08:03 scottgerring

but not push to it

what's the error you get?

can you push if it's not a force push?

trask avatar Apr 10 '25 15:04 trask

git push --force
remote: error: GH006: Protected branch update failed for refs/heads/scottgerring/benchmark-fix.        
remote: 
remote: - Required status check "EasyCLA" is expected.        
To github.com:open-telemetry/opentelemetry-rust.git
 ! [remote rejected]   scottgerring/benchmark-fix -> scottgerring/benchmark-fix (protected branch hook declined)
error: failed to push some refs to 'github.com:open-telemetry/opentelemetry-rust.git'

Actually looking now, I wonder if the "EasyCLA" push guard is misconfigured? I can see that it's ✅ on the PR, and i've certainly signed it at this point.

scottgerring avatar Apr 10 '25 15:04 scottgerring

I've added the suggested dependabot branch protection: https://github.com/open-telemetry/community/blob/main/docs/how-to-configure-new-repository.md#branch-protection-rule-dependabot

if you push a branch now with that matches the pattern dependabot/**/* then it should work

the idea is that **/** is a catch all and has higher bar just in case it contains a long-lived branch like a release branch

trask avatar Apr 10 '25 15:04 trask

Hey @trask - thanks! And sorry i've been a bit distracted these last 2 weeks. I still can't push the second time - still this business with EasyCLA:

> git push

....

remote: Resolving deltas: 100% (4/4), completed with 4 local objects.
remote: error: GH006: Protected branch update failed for refs/heads/dependabot/scottgerring/benchmark-fix.
remote:
remote: - Required status check "EasyCLA" is expected.
To github.com:open-telemetry/opentelemetry-rust.git
 ! [remote rejected]   dependabot/scottgerring/benchmark-fix -> dependabot/scottgerring/benchmark-fix (protected branch hook declined)
error: failed to push some refs to 'github.com:open-telemetry/opentelemetry-rust.git'

To zoom back out a bit - the issue is, we can't easily test what works and what doesn't on shared-workers without hacking on main, now. If you could alternatively share what the environment is (e.g. - ubuntu / arch / whatever) - I can hack up a solution that seems like it should work using a similar environment, and then just merge that in and hopefully get it first try.

I'm cognisant this is dragging on and perhaps this would make it easier for you!

scottgerring avatar Apr 23 '25 12:04 scottgerring

no worries! it looks like I set up the dependabot/**/* branch protection incorrectly, can you try again?

trask avatar Apr 23 '25 13:04 trask

It works! Here's the PR I hope you can bless to use the workers --> https://github.com/open-telemetry/opentelemetry-rust/pull/2942

And here's a run attempting to use self-hosted: https://github.com/open-telemetry/opentelemetry-rust/actions/runs/14620756761/job/41019880580?pr=2942

scottgerring avatar Apr 23 '25 14:04 scottgerring

Looks like we got it to work: https://github.com/open-telemetry/opentelemetry-rust/pull/2942#discussion_r2056229266

@scottgerring let's leave this issue open until you're done with the branch so I don't forget to clean up the branch list on the github runner

trask avatar Apr 23 '25 18:04 trask