agent
agent copied to clipboard
Add repository consolidation
Consolidating jobs using the same repository into single build directory
The idea here is that with large repositories, we waste a lot of space on our nodes when we have numerous pipelines. This is because the agent creates build dirs based on the pipeline-slug
. We could save a lot of space and not need to limit the amount of pipelines we make by having all builds of the same repo use the same build directory
This would be really helpful for our org so I imagine other teams with large repositories could benefit as well
Things I'd like feedback on
- [ ] Language: specifically around "Consolidate all builds using identical repositories into a single build directory" being the language convention to use for this functionality
- [ ] Hashing: This approach ensures the dir is alphanumeric, but certainly obscures what the build dir is. Thoughts? If this is the right approach we could also move it into utils.
- [ ] Protected Env: I currently have this as a protected ENV variable, I think this is the right approach so it must be configured at the agent level. Open to discussion though.
Next steps
- If we get sign off on this, I'd of course like to add tests.
@pda Would you mind reviewing this?
Sorry it's taken us so long to get back to you @pyasi! My main concern with this is that it limits concurrent access by lots of pipelines to the one checkout 🤔
Given that the git-mirrors
experiment uses a reference clone of a single repository folder, does that mitigate some of the concerns, or are there other factors?
@lox all pipelines using the same repo will use the same checkout, correct, but since it's still PATH/TO/BUILDS_DIR/BUILDKITE_AGENT-1/repo
wouldn't there never be concurrent workloads running on the same checkout dir? Unless the a single agent can run two jobs concurrently for multiple pipelines.
The git-mirrors
experiment doesn't quite solve this because it still requires a "clone" of the repo for each pipeline. So let's say you have a 1 gig repo with 10 pipelines, that adds up quite fast.
We've solved this internally through a hook so no rush on this. But let me know if the above makes sense as a response to the concurrency issue you spotted.
The git-mirrors experiment doesn't quite solve this because it still requires a "clone" of the repo for each pipeline. So let's say you have a 1 gig repo with 10 pipelines, that adds up quite fast.
Interesting, you have a repo that is 1gb checkout out? (e.g not including the git history)
We used reference repos and would pull from them instead of from remote. This was to reduce traffic on GHE and speed up clone times, so it's not identical to the git-mirrors
experiment. I vaguely remember looking into the experiment a while back and it wasn't solving our primary problem IIRC. I'd have to dig up the reason again though.