agent No, it's german for "The Bootstrap, the"

(joke context)

The core of the buildkite agent (one of its cores, anyway) is a component currently called "The Bootstrap". This is the part of the agent that's actually responsible for running jobs, streaming their logs back to the buildkite mothership, and doing all the business of running hooks, finding plugins, doing git things, etc.

Were it only that simple.

What we call "the bootstrap" is actually three separate components from this repo's point of view:

A CLI command, buildkite-agent bootstrap, which is what the agent calls when it gets a new job to run
A go package called bootstrap that contains most of the code that gets run to run a job
A go struct, bootstrap.Bootstrap which holds the logic for job execution (though there are other peripheral job execution-related bits and bobs hanging around in the bootstrap package mentioned above)

These three things being named the same thing makes talking about them separately a pain; when talking about "the bootstrap", there's a variety of things that could be the subject of discussion.

Furthermore...

"Bootstrap" is a kind of a crappy name for what this thing does

There was a time, long ago, when this name probably fit. Fun fact, prior to v3 of the agent, the bootstrap used to be a bash script that the agent ran. At this point, the bootstrap was mostly responsible for standing up (bootstrapping, one might say) an environment in which a job (at the time a bash script and nothing more).

Times have changed however, and the bootstrap is now a (very) complex piece of go code responsible for orchestrating all of the various tasks that need to happen before, during and after a job run.

Okay, but why change it?

Simply put, the name is confusing and it means that when we talk about the bootstrap (which we usually mean as "the job execution thingy") to our colleagues and to our customers, there's context that's lost in translation.

The bootstrap is an incredibly important part - maybe the most important part - of a job's execution lifecycle, and we fairly regularly have need to talk to customers about it. Knowing what the bootstrap actually is requires knowledge of the agent's history though, and it makes talking about these things, and intuiting how the agent actually works, a lot harder.

Consider: If you were a buildkite customer and a bikkie said "oh that's a bootstrap error", what would you think the problem is? How about if they said (foreshadowing) "I think there's an error in the job executor"?

Cool. What have you done about it?

This PR is basically a big fancy find-and-replace. The gist of it is:

The buildkite-agent bootstrap command is deprecated (but not removed) and replaced with buildkite-agent run-job. This new command is functionally identical to the existing one, with the only change being that it doesn't have a deprecation notice
The bootstrap package has been renamed to job. This makes a lot of names clearer IMO - consider bootstrap.Shell vs job.Shell
The boostrap.Bootstrap struct has been renamed to job.Executor. This is more in line with what it actually does - it executes a job

None of these names are final - i'd love some feedback on them. Two hard things and all that.

Open Questions

[ ] Is job.Executor too similar semantically to agent.JobRunner? My opinion is no, but it's not particularly strongly held
[ ] Should we bother scrubbing all mention of the bootstrap from the repo or is it okay to leave some of them in there?

Still to do

[x] Update agent/job_runner.go to:
- [x] Use the new nomenclature
- [x] Add a hook called pre-exec, identical to pre-bootstrap but with the shiny new name
- [x] Add a deprecation warning to the pre-bootstrap hook??? should we just continue to allow it?
[x] Another round of seek-and-destroy on instances of the text bootstrap. They're pervasive!
[x] Local smoke testing to ensure that:
- [x] The agent uses buildkite-agent exec-job as its job executor by default
- [x] buildkite-agent bootstrap still works okay, but outputs a deprecation warning
- [x] The agent's bootstrap can be overridden using both --bootstrap-script and --job-executor-script.

Feb 16 '23 11:02 moskyb

Naming bike-shedding: I wonder if run-job would match our other terminology closer than exec-job. e.g. Job state will be running as a result of this not-bootstrap thing happening. And when you look at it in Test Analytics, it'll be called a Run (I think). Also, “exec” feels quite low-level syscall-ish, whereas the not-bootstrap does quite a lot of higher-level coordination before executing one-or-more processes/hooks/plugins/containers/things.

Apologies, I haven't looked/thought deeper about the PR more broadly, I only have this bike-shed right now 😅

Feb 24 '23 00:02 pda

Also: I was totally baited into looking at this by the excellent PR title 🤡

Feb 24 '23 00:02 pda

I wonder if run-job would match our other terminology closer than exec-job

@pda i think i agree with you here - it's terser while also holding more information. how would you feel about renaming the command to run-job while keeping the struct in the job package job.Executor? There's a slight mismatch in naming there, but i think it nicely delineates between the internals and the externals (porcelain and plumbing in git terms, i guess)

Also: I was totally baited into looking at this by the excellent PR title 🤡

my cunning plan has worked then

Feb 26 '23 21:02 moskyb

how would you feel about renaming the command to run-job while keeping the struct in the job package job.Executor? There's a slight mismatch in naming there, but i think it nicely delineates between the internals and the externals (porcelain and plumbing in git terms, i guess)

Interesting question.

I'm a proponent of ubiquitous language; it'd be a shame to have two names for one thing.

One arguable argument against “runner” is that other platforms call their entire agent a “runner” (GitHub Actions, GitLab), and a subset of our customers will confuse it with that.

The other that you touched on is that we already have a component called JobRunner which lives in the agent outside the ~bootstrap~ executor/runner/thing.

I don't have the answers 🤷‍♂️

I wonder…

buildkite-agent start (some people think of this as the “Buildkite self-hosted runner”)
- loop: get jobs (specifically: Command Step jobs, aka Command Jobs)
  - internal JobRunner prepares & orchestrates running the Command Job
    - buildkite-agent bootstrap (rename to run-job?) subprocess
      - do the lifecycle of the Command Job; command, plugins, hooks etc

Maybe JobRunner becomes JobOrchestrator and boostrap.Bootstrap becomes job.Runner? I don't love it.

Taking a step back from specifics…

the main process gets a job and wants to run it, but doesn't know how or isn't capable of doing so directly; it delegates to another layer in a subprocess to actually run the job.
that subprocess exists to run jobs, and knows how to run jobs.

Through that lens, the subprocess has a much stronger claim to “run job” or “job runner” naming, and the main process should find a different name that means ”knows that a job needs running and knows how to ask a subprocess to run the job”.

Mar 02 '23 23:03 pda

Possible alternative names for agent.JobRunner (i.e. the bit that doesn't actually execute the job, it just kicks it off elsewhere)

agent.JobManager
~agent.JobForker~
agent.JobOrchestrator
agent.JobStarter
agent.JobSupervisor
agent.JobInvoker

None of those feel great. What does it actually do?

Starts the subprocess to run the job
- Collates the correct env to pass to that process
Streams stdout / stderr / header times to the API
Experimentally knows how to run jobs in k8s/etc instead of as a subprocess

The “k8s/etc” bit means it's not a JobForker.

I'd call it JobDispatcher except that means something different server-side, and the log streaming etc goes a bit beyond just “dispatching”.

agent.JobManager is okayish, to the extent that “manager” is ever a good name for a software component 😬

Maybe it's a JobInvoker but that's just adding yet another synonym for “run” / “execute”.

The fact that it's learning run jobs in different ways (subprocess / k8s / …) feels important here. Again, “dispatcher” kind of suits that. So does “strategy”.

Mar 02 '23 23:03 pda

@pda very interesting thoughts 🤔 i agree with you that there remains some confusion about the role of the agent.JobRunner vs job.Executor, but how would you feel about making that change at a later date? my take is that the current setup makes things clearer, though maybe not as clear as they possibly could be, but it's a step in the right direction.

the good thing is that those names (job.Executor and agent.JobRunner) are both completely internal, and can be pretty easily changed

Mar 03 '23 01:03 moskyb

I quite like agent.JobSupervisor for the current agent.JobRunner. Prior art from supervisord.

I'd always imagine that we'd add "Executors" which where strategies for executing the bootstrap, what we do now is a LocalShellExecutor or similar. We've built a DockerExecutor at CashApp, I've built an AmazonECSExecutor in the past.

Finding the right name for the bootstrap is a real challenge. The architecture we've built at CashApp where we run the buildkite-agent bootstrap in a docker container (the logical extension of https://github.com/buildkite/docker-bootstrap-example) has really exposed the confusing-ness of the name. The bootstrap is almost not even part of the agent anymore, it could even be running on a totally different host depending on the executor.

What if you actually decoupled it from the buildkite-agent binary? What if it was a buildkite-agent-job-runtime? That also plays into the bk cli and wanting to run a job locally (which actually doesn't need an agent).

The other aspect here of what the bootstrap does is it manages phases (I wish I'd called this stages), hooks and plugins. I've frequently wanted more granular access to these things, for instance being able to call buildkite-agent bootstrap default-checkout-phase directly.

If I was pushed to pick a name for a straight sub-command rename, I'd actually aim to extract it out of the job subcommand to leave room for job commands that operate on the active job (the bootstrap does not in the same way that the other commands do). What about buildkite-agent job-runtime or buildkite-agent job-kernel execute? 😅

Jun 01 '23 07:06 lox

agent agent copied to clipboard

No, it's german for "The Bootstrap, the"

"Bootstrap" is a kind of a crappy name for what this thing does

Okay, but why change it?

Cool. What have you done about it?

Open Questions

Still to do

agent
agent copied to clipboard