feedback
feedback copied to clipboard
Support monorepos in github
It is a bit open ended, but it would be great if buildkite provided better support for monorepos. That is, single github repo with per-project subdirectories, that have separate project-specific pipelines.
Ideal setup
- support running a separate pipeline (maybe even more than one) for each project in the monorepo
- if a pull request only affects one project , only run that one pipeline
- if a pull request affects two projects, run pipelines for both
My hacky solution
I managed to almost get there by doing this:
- set up a separate pipeline for every project in buildkite console, all pointing to the same repo. You'll have add a webhook
- set up
.buildkite/pipeline.yml
per project - have one dummy project with a no-op
.buildkite/pipeline.yml
- set up a repository-wide post-checkout hook, like the one suggested in #256, that looks at changed files and either gies to the project dir, or if there are no changes to a project, goes to a directory with a dummy empty pipeline:
#!/bin/bash
set -e -o pipefail
if [[ ! -z "${PROJECT_DIR}" ]] ; then
if [[ -z "${BUILDKITE_PULL_REQUEST_BASE_BRANCH}" ]] ; then
# Run all tests if not testing a pull request
cd "${PROJECT_DIR}"
else
# On pull requests, only run tests for affected projects
if git --no-pager diff --name-only "origin/${BUILDKITE_PULL_REQUEST_BASE_BRANCH}..HEAD" | grep "^${PROJECT_DIR%%/}/" ; then
cd "${PROJECT_DIR}"
else
echo "skipping build: no changes for ${PROJECT_DIR}"
cd .buildkite/dummy_pipeline/
fi
fi
fi
Now, the only annoying bit with this setup is that you get status badge for every pipeline on every PR , even though some of them correspond to skipped (dummy) builds. That quickly becomes pretty noisy UX wise.
Ideal solution
I can think of a few ways Buildkite could solve this in a better way:
-
Allow to skip builds entirely from the post-checkout hook (without doing that dummy pipeline thing above). That'd be ideal, though I feel like it goes a bit against your architecture in a sense that by the time post-checkout hook is executed, build is considered started -- you can cancel it but cannot pretend it havent existed in the first place.
-
Allow to enable/disable github status per step from the pipeline config (see https://github.com/buildkite/agent/issues/374 )
-
Skip builds at the webhook level. Imagine having a webhook proxy that'd only pass webhook on to buildkite pipeline if particular PR affects this pipeline's project. That'd be the easiest option to use, however I think that'd require buildkite to have github permissions to access my code (which I'm personally totally fine with).
Over here, we're investing in https://bazel.build for our build automation within our monorepo - that will become the thing that decides which work is necessary based on its knowledge of where changes are. Otherwise, I haven't thought of a safe way for a tool / product to do this, without an equivalent of the declared-dependencies that bazel BUILD files provide.
That's a good point, generally I think you'd indeed want something like bazel for monorepo builds. I don't think Buildkite will ever be able to solve that entirely.
However, in a bunch of cases I've seen, monorepo subprojects are separate microservices, and they don't depend on other subprojects as libraries. In this case figuring out what to build can be done reliably by just looking at the list of changed files.
Most basic use case for me is basically: "run this linter script on any changes to this subdirectory (but don't spam github status section on unrelated pull requests)".
Really great summary of places for improvement, and discussion. Thank you!
There’s also https://github.com/chronotc/monorepo-diff-buildkite-plugin by @chronotc if you haven’t seen it yet.
These are great resources - I have a somewhat related question:
Is it possible to configure buildkite agent to not check out repositories multiple times (once for each pipeline) but only once? Goal is to save disk space. For large repositories it's not impossible to run out and those large fast SSDs are still on the expensive side.
@sschaetz if you set the agent’s git-clone-flags
option to include a --reference-if-able /some/dir then you can speed things up dramatically using a local checkout cache (which maybe you could pull every hour?). This flag can be changed from a plugin using the pre-checkout
hook too.
There's also @sj26’s https://github.com/sj26/git-worktree-buildkite-hooks which is probably begging to be turned into a plugin, though it overrides the checkout
hook which means it can get out of sync with features that are built into the bootstrap.
Our solution is creating a triggering pipeline and many sub-pipelines for each subrepo. In the triggering pipeline, we set the only webhook and then run a script that generates a dynamic pipeline setting that including all the sub-pipelines needed to be triggered as steps based on a pipeline-directories mapping JSON. The benefits of doing it this way are that sub-pipelines won't have a lot of triggering build histories and the badges still working.
We use the following code to distingquish if a sub-pipeline should be triggered.
env git --no-pager diff --name-only "${last_build_revision}..${BUILDKITE_COMMIT}" | \
grep -vE "${exclude_regex}" | \
grep -E "${include_regex}" > /dev/null
And using the following code to generate the steps.
if is-affected "${last_build_revision}" "${include_folders}" "${exclude_folders}" ; then
echo " - trigger: '${pipeline}'"
echo " label: ':rocket: Trigger ${pipeline}'"
echo " build: { commit: ${BUILDKITE_COMMIT}, branch: '${BUILDKITE_BRANCH}' }"
Our json looks like this:
{
"buildkite-example": {
"include": [
"buildkite-example"
],
"exclude": [
"buildkite-example/excludes"
]
}
}
At @codercom, we have a dynamic pipeline that based on the directories changed, injects the appropriates steps into the pipeline to test/deploy them.
We're probably going to move to a solution like @SanCoder-Q though where the root pipeline generates triggers for all pipelines that need to run based on the changed directories.
I was just speaking with @buddyspike at DevOps Day conference in Sydney about https://github.com/mbtproject/mbt. Might be a useful tool to look at for mono-repos.
There are lots of things in this issue that we'd like to build, but for now dynamically uploading trigger steps that execute specific pipelines for each module seems like a good approach.
I'd love to see the following building blocks:
-
buildkite-agent skip
command that marked a build as skipped - A
pre-accept
hook in the agent that allows code to be run before the agent officially accepts it (perhaps skip is allowed here?) - Make the changed files information from the GitHub webhooks available to builds, OR make it easier to figure out what base commit (last successful build?) to use for a local git operation. Currently, this needs to use an API call.
We currently use buildkite with a mobile monorepo where the setup looks like this:
- We have a single pipeline
- We use the webhook to upload our checked in config that only has one step to call run.sh
- If an environment variable isn't set, we run a python script that hits the github api and checks what builds need to be triggered
- From the python script we start N jobs with the API, setting up the environment variables specify what needs to happen
- These jobs go back through run.sh but hit a different path to actually run the build
This has a few small downsides:
- GitHub statuses are best managed by each individual build since the per-job handling doesn't really fit this case
- We end up creating N builds per PR, instead of 1 build with N jobs (I haven't checked if the latter is possible even)
- If we wanted to have significantly different agent targeting rules I'm not sure if this is possible since right now we always use the same single step with the same script and same targeting
But overall this works great for us!
Hey @SanCoder-Q, how are you handling last_build_revision
? Are you saving this state somewhere in buildkite itself?