community icon indicating copy to clipboard operation
community copied to clipboard

Set up bare metal GiHub self-hosted runner on Oracle Cloud

Open trask opened this issue 5 months ago • 4 comments

Our current bare metal self-hosted runners are going away, so we are working on setting up a bare metal instance in Oracle Cloud as a GitHub self-hosted runner.

So far

  • [x] Set up the Bare Metal instance in Oracle Cloud
  • [x] Set it up as a GitHub self-hosted runner as (but need to leave it disabled for now, see below)

Todo:

  • [ ] Migrate all existing usages of runs-on: self-hosted to runs-on: equinix-bare-metal
    • This is needed because all self-hosted runners have the self-hosted label, and so as soon as we enable the new self-hosted runner, those workflows will sometimes get one and sometimes get the other.
    • Usages of the new self-hosted runner should use runs-on: oracle-bare-metal-64cpu-512gb-x86-64 to differentiate.
  • [ ] Decide if we want to use container-based workflows in order to isolate environments, or do we want to run the jobs directly on the host, in which case what software do we need to install on the host?

Related to

  • #2701 (cc @scottgerring)
  • #2789 (cc @cijothomas)

trask avatar Jun 06 '25 04:06 trask

I’ll let @cijothomas and @clhain explicitly answer the questions mentioned in this issue. From my perspective, a key criterion is selecting a solution that maximizes our chances of obtaining benchmark results with minimal external interference and dependencies, ensuring the results remain as stable as possible over time.

lquerel avatar Jun 06 '25 15:06 lquerel

Migrate all existing usages of runs-on: self-hosted to runs-on: equinix-bare-metal

PRs have been sent to all the repos that are using runs-on: self-hosted.

trask avatar Jun 06 '25 16:06 trask

@open-telemetry/python-maintainers @open-telemetry/javascript-maintainers can you review the two PRs below? we need all existing repos to migrate to the new label before we can move forward with adding a second self-hosted runner. thanks!

  • https://github.com/open-telemetry/opentelemetry-python/pull/4622
  • https://github.com/open-telemetry/opentelemetry-js/pull/5747

trask avatar Jun 10 '25 21:06 trask

Hey @trask cool!

Decide if we want to use container-based workflows in order to isolate environments, or do we want to run the jobs directly on the host, in which case what software do we need to install on the host?

We'd want unzip, rustup, and build-essential(assuming ubuntu) at least. If you have the inclination, you could also cloneopentelemetry-rustonto the node and runcargo criterion` from the root of the project; this should make it clear if we've missing anything else.

Once you've done that I can switch our benchmark build on pushes to main back to run on the shared workers!

https://github.com/open-telemetry/opentelemetry-rust/blob/1f0d9a9f62a3f7829e6065191fa1c3d4065b269c/.github/workflows/benchmark.yml#L29-L32

scottgerring avatar Jun 12 '25 07:06 scottgerring

@scottgerring @cijothomas the new self-hosted runner is available now

you can see an example here: https://github.com/open-telemetry/sig-project-infra/pull/43/files

@scottgerring using a container-based workflow will give you the chance to pick a container that has the tools you need and will be more portable if we spin up more self-hosted runners in the future, let me know if that works

trask avatar Jun 20 '25 00:06 trask

btw, I've given access to opentelemetry-rust and otel-arrow repos, with no restrictions on workflows / git refs for now so you can test it out via PRs like in the example above

once you have things working and merged then we can further restrict it

trask avatar Jun 20 '25 00:06 trask

Hey @trask - thanks for slogging away at this! I'm having a play with it now. I can see that the worker runs and will try get the job going :)

scottgerring avatar Jun 20 '25 07:06 scottgerring

Hey @trask it lives! Here's runs from main on the dedicated workers:

https://github.com/open-telemetry/opentelemetry-rust/actions/workflows/benchmark.yml?query=event%3Apush

Thanks for your help, and feel free to remove the special blessed branches rule :)

scottgerring avatar Jul 04 '25 12:07 scottgerring