agent
agent copied to clipboard
No, it's german for "The Bootstrap, the"
The core of the buildkite agent (one of its cores, anyway) is a component currently called "The Bootstrap". This is the part of the agent that's actually responsible for running jobs, streaming their logs back to the buildkite mothership, and doing all the business of running hooks, finding plugins, doing git things, etc.
Were it only that simple.
What we call "the bootstrap" is actually three separate components from this repo's point of view:
- A CLI command,
buildkite-agent bootstrap, which is what the agent calls when it gets a new job to run - A go package called
bootstrapthat contains most of the code that gets run to run a job - A go struct,
bootstrap.Bootstrapwhich holds the logic for job execution (though there are other peripheral job execution-related bits and bobs hanging around in thebootstrappackage mentioned above)
These three things being named the same thing makes talking about them separately a pain; when talking about "the bootstrap", there's a variety of things that could be the subject of discussion.
Furthermore...
"Bootstrap" is a kind of a crappy name for what this thing does
There was a time, long ago, when this name probably fit. Fun fact, prior to v3 of the agent, the bootstrap used to be a bash script that the agent ran. At this point, the bootstrap was mostly responsible for standing up (bootstrapping, one might say) an environment in which a job (at the time a bash script and nothing more).
Times have changed however, and the bootstrap is now a (very) complex piece of go code responsible for orchestrating all of the various tasks that need to happen before, during and after a job run.
Okay, but why change it?
Simply put, the name is confusing and it means that when we talk about the bootstrap (which we usually mean as "the job execution thingy") to our colleagues and to our customers, there's context that's lost in translation.
The bootstrap is an incredibly important part - maybe the most important part - of a job's execution lifecycle, and we fairly regularly have need to talk to customers about it. Knowing what the bootstrap actually is requires knowledge of the agent's history though, and it makes talking about these things, and intuiting how the agent actually works, a lot harder.
Consider: If you were a buildkite customer and a bikkie said "oh that's a bootstrap error", what would you think the problem is? How about if they said (foreshadowing) "I think there's an error in the job executor"?
Cool. What have you done about it?
This PR is basically a big fancy find-and-replace. The gist of it is:
- The
buildkite-agent bootstrapcommand is deprecated (but not removed) and replaced withbuildkite-agent run-job. This new command is functionally identical to the existing one, with the only change being that it doesn't have a deprecation notice - The
bootstrappackage has been renamed tojob. This makes a lot of names clearer IMO - considerbootstrap.Shellvsjob.Shell - The
boostrap.Bootstrapstruct has been renamed tojob.Executor. This is more in line with what it actually does - it executes a job
None of these names are final - i'd love some feedback on them. Two hard things and all that.
Open Questions
- [ ] Is
job.Executortoo similar semantically toagent.JobRunner? My opinion is no, but it's not particularly strongly held - [ ] Should we bother scrubbing all mention of the bootstrap from the repo or is it okay to leave some of them in there?
Still to do
- [x] Update
agent/job_runner.goto:- [x] Use the new nomenclature
- [x] Add a hook called
pre-exec, identical topre-bootstrapbut with the shiny new name - [x] Add a deprecation warning to the
pre-bootstraphook??? should we just continue to allow it?
- [x] Another round of seek-and-destroy on instances of the text
bootstrap. They're pervasive! - [x] Local smoke testing to ensure that:
- [x] The agent uses
buildkite-agent exec-jobas its job executor by default - [x]
buildkite-agent bootstrapstill works okay, but outputs a deprecation warning - [x] The agent's bootstrap can be overridden using both
--bootstrap-scriptand--job-executor-script.
- [x] The agent uses
Naming bike-shedding: I wonder if run-job would match our other terminology closer than exec-job.
e.g. Job state will be running as a result of this not-bootstrap thing happening.
And when you look at it in Test Analytics, it'll be called a Run (I think).
Also, “exec” feels quite low-level syscall-ish, whereas the not-bootstrap does quite a lot of higher-level coordination before executing one-or-more processes/hooks/plugins/containers/things.
Apologies, I haven't looked/thought deeper about the PR more broadly, I only have this bike-shed right now 😅
Also: I was totally baited into looking at this by the excellent PR title 🤡
I wonder if
run-jobwould match our other terminology closer thanexec-job
@pda i think i agree with you here - it's terser while also holding more information. how would you feel about renaming the command to run-job while keeping the struct in the job package job.Executor? There's a slight mismatch in naming there, but i think it nicely delineates between the internals and the externals (porcelain and plumbing in git terms, i guess)
Also: I was totally baited into looking at this by the excellent PR title 🤡
my cunning plan has worked then
how would you feel about renaming the command to run-job while keeping the struct in the job package job.Executor? There's a slight mismatch in naming there, but i think it nicely delineates between the internals and the externals (porcelain and plumbing in git terms, i guess)
Interesting question.
I'm a proponent of ubiquitous language; it'd be a shame to have two names for one thing.
One arguable argument against “runner” is that other platforms call their entire agent a “runner” (GitHub Actions, GitLab), and a subset of our customers will confuse it with that.
The other that you touched on is that we already have a component called JobRunner which lives in the agent outside the ~bootstrap~ executor/runner/thing.
I don't have the answers 🤷♂️
I wonder…
buildkite-agent start(some people think of this as the “Buildkite self-hosted runner”)- loop: get jobs (specifically: Command Step jobs, aka Command Jobs)
- internal JobRunner prepares & orchestrates running the Command Job
buildkite-agent bootstrap(rename torun-job?) subprocess- do the lifecycle of the Command Job; command, plugins, hooks etc
- internal JobRunner prepares & orchestrates running the Command Job
- loop: get jobs (specifically: Command Step jobs, aka Command Jobs)
Maybe JobRunner becomes JobOrchestrator and boostrap.Bootstrap becomes job.Runner? I don't love it.
Taking a step back from specifics…
- the main process gets a job and wants to run it, but doesn't know how or isn't capable of doing so directly; it delegates to another layer in a subprocess to actually run the job.
- that subprocess exists to run jobs, and knows how to run jobs.
Through that lens, the subprocess has a much stronger claim to “run job” or “job runner” naming, and the main process should find a different name that means ”knows that a job needs running and knows how to ask a subprocess to run the job”.
Possible alternative names for agent.JobRunner (i.e. the bit that doesn't actually execute the job, it just kicks it off elsewhere)
agent.JobManager- ~
agent.JobForker~ agent.JobOrchestratoragent.JobStarteragent.JobSupervisoragent.JobInvoker
None of those feel great. What does it actually do?
- Starts the subprocess to run the job
- Collates the correct
envto pass to that process
- Collates the correct
- Streams stdout / stderr / header times to the API
- Experimentally knows how to run jobs in k8s/etc instead of as a subprocess
The “k8s/etc” bit means it's not a JobForker.
I'd call it JobDispatcher except that means something different server-side, and the log streaming etc goes a bit beyond just “dispatching”.
agent.JobManager is okayish, to the extent that “manager” is ever a good name for a software component 😬
Maybe it's a JobInvoker but that's just adding yet another synonym for “run” / “execute”.
The fact that it's learning run jobs in different ways (subprocess / k8s / …) feels important here. Again, “dispatcher” kind of suits that. So does “strategy”.
@pda very interesting thoughts 🤔 i agree with you that there remains some confusion about the role of the agent.JobRunner vs job.Executor, but how would you feel about making that change at a later date? my take is that the current setup makes things clearer, though maybe not as clear as they possibly could be, but it's a step in the right direction.
the good thing is that those names (job.Executor and agent.JobRunner) are both completely internal, and can be pretty easily changed
I quite like agent.JobSupervisor for the current agent.JobRunner. Prior art from supervisord.
I'd always imagine that we'd add "Executors" which where strategies for executing the bootstrap, what we do now is a LocalShellExecutor or similar. We've built a DockerExecutor at CashApp, I've built an AmazonECSExecutor in the past.
Finding the right name for the bootstrap is a real challenge. The architecture we've built at CashApp where we run the buildkite-agent bootstrap in a docker container (the logical extension of https://github.com/buildkite/docker-bootstrap-example) has really exposed the confusing-ness of the name. The bootstrap is almost not even part of the agent anymore, it could even be running on a totally different host depending on the executor.
What if you actually decoupled it from the buildkite-agent binary? What if it was a buildkite-agent-job-runtime? That also plays into the bk cli and wanting to run a job locally (which actually doesn't need an agent).
The other aspect here of what the bootstrap does is it manages phases (I wish I'd called this stages), hooks and plugins. I've frequently wanted more granular access to these things, for instance being able to call buildkite-agent bootstrap default-checkout-phase directly.
If I was pushed to pick a name for a straight sub-command rename, I'd actually aim to extract it out of the job subcommand to leave room for job commands that operate on the active job (the bootstrap does not in the same way that the other commands do). What about buildkite-agent job-runtime or buildkite-agent job-kernel execute? 😅