github-act-runner icon indicating copy to clipboard operation
github-act-runner copied to clipboard

Optionally bind ephemeral runner to particular GitHub job?

Open nwf opened this issue 3 years ago • 5 comments

Hello and thanks for github-act-runner. We're using it in a home-brew implementation of https://docs.github.com/en/actions/hosting-your-own-runners/autoscaling-with-self-hosted-runners . Our system dispatches work requests from GitHub to a pool of consumers that queue for such requests and, upon receipt, spawn an ephemeral github-act-runner. Curiously, if GitHub pushes two work requests our way (with two distinct workflow_job.id values), there seems to be no way for us to instruct the two resulting ephemeral runners to pick up the workflow_job that triggered their creation. This is fine at present, because all our workers are uniform, but it seems like the kind of thing that might occasionally result in strange logs: worker N constructs an ephemeral runner in response to job request X from repo R but ends up running job Y (which is, at least, also guaranteed to also be from repo R), while worker M constructs an ephemeral runner for Y but ends up running X.

Are you aware of any way that we could plumb the workflow_job.id value down to the ephemeral runner and thence to GitHub when it asks to pick up a job on the repository? I'm sorry if this should be obvious and I have simply overlooked something in the documentation.

nwf avatar May 13 '22 10:05 nwf

I'm not aware that you can tell the actions service to assign the job request to a specific runner. This runner have to accept any job request it receives, in case of ephemeral runners you cannot even receive a second job.

This is fine at present, because all our workers are uniform

I would suggest to use different runner labels to distinguish different runner configurations or sizes.

runs-on: [ small, x64 ]
runs-on: [ big, x64, gpu ]

It might be possible to do what you want by redesigning this app. E.g.

  • One or multiple fake runner are registered
    • these fake runners would never execute any job itself
    • hosted side by side to the scaling solution
    • forwarding the requests to the specific constructed ephemeral runner
      • You need know a way to get the workflow_job.id from a job request, I don't think GitHub sends it to the runner.

ChristopherHX avatar May 13 '22 15:05 ChristopherHX

Yeah, we're using different runs-on for different configurations, so there's no real risk (I think), it just seemed surprising.

Thanks for the redesign sketch; I'll ponder such a thing.

nwf avatar May 13 '22 15:05 nwf

I think this can now be implemented for an autoscaler, however needs changes to the autoscaler to be able to use it. Currently in test phase in #67.

  • workflow_job event - queued
    • configure ephemeral runner, remember the name of the agent
    • run ephemeral runner with custom worker script, see #67 for the current draft
  • ephemeral runner picks up a job, can be the wrong one
    • if a job with a subset of the requested labels is created at a similar time
    • the job with more labels, might needs to spawn another ephemeral runner, in case a job with fewer labels has stolen your runner instance
  • workflow_job event - in_progress, agent name is in the payload you can wait for this event from the custom worker script / ephemeral runner calls the worker script
    • both events might happen in a random order
  • the worker script pipes the stdin to the actual ephemeral worker
    • inside jail, container or VM
    • github-act-runner worker reads the job request
      • This process cannot request any job from github actions, since it only has credentials for the current job ( not even required to be an ephemeral runner )
      • All outgoing communtication is with github actions
      • stdin should be kept open to be able to process cancel events

ChristopherHX avatar Jun 29 '22 20:06 ChristopherHX

I also created a powershell script, which allows you to use the same autoscale technique with an actions/runner worker https://github.com/ChristopherHX/github-act-runner/blob/d6476e3e43a870c3becd0eb1a745a36524ffce0a/compat/actions-runner-worker.ps1

A Set up Worker step is prepended, which can be used to provide information if autoscaling takes more time. However this is not implemented yet. If your worker script returns an error exit code ( non zero exit code), then stdout and stderr from the worker script are appended as an failed step to the job.

e.g. to use a custom worker github-act-runner run -w bash -w /path/to/my/script, it is called like [bash, /path/to/my/script]

ChristopherHX avatar Jun 29 '22 20:06 ChristopherHX

Related to:

  • https://github.com/actions/runner/issues/620
  • https://github.com/actions/runner/issues/2106
  • https://github.com/community/community/discussions/19784

If actions/runner supports this, then we are ready to clone this functionality.

ChristopherHX avatar Sep 20 '22 19:09 ChristopherHX