feedback icon indicating copy to clipboard operation
feedback copied to clipboard

Support monorepos in github

Open oavdeev opened this issue 6 years ago • 10 comments

It is a bit open ended, but it would be great if buildkite provided better support for monorepos. That is, single github repo with per-project subdirectories, that have separate project-specific pipelines.

Ideal setup

  1. support running a separate pipeline (maybe even more than one) for each project in the monorepo
  2. if a pull request only affects one project , only run that one pipeline
  3. if a pull request affects two projects, run pipelines for both

My hacky solution

I managed to almost get there by doing this:

  • set up a separate pipeline for every project in buildkite console, all pointing to the same repo. You'll have add a webhook
  • set up .buildkite/pipeline.yml per project
  • have one dummy project with a no-op .buildkite/pipeline.yml
  • set up a repository-wide post-checkout hook, like the one suggested in #256, that looks at changed files and either gies to the project dir, or if there are no changes to a project, goes to a directory with a dummy empty pipeline:
#!/bin/bash
set -e -o pipefail

if [[ ! -z "${PROJECT_DIR}" ]] ; then

if [[ -z "${BUILDKITE_PULL_REQUEST_BASE_BRANCH}" ]] ; then
    # Run all tests if not testing a pull request
    cd "${PROJECT_DIR}"
else
    # On pull requests, only run tests for affected projects
    if git --no-pager diff --name-only "origin/${BUILDKITE_PULL_REQUEST_BASE_BRANCH}..HEAD" | grep "^${PROJECT_DIR%%/}/" ; then
        cd "${PROJECT_DIR}"
    else
        echo "skipping build: no changes for ${PROJECT_DIR}"
        cd .buildkite/dummy_pipeline/
    fi
fi

fi

Now, the only annoying bit with this setup is that you get status badge for every pipeline on every PR , even though some of them correspond to skipped (dummy) builds. That quickly becomes pretty noisy UX wise.

Ideal solution

I can think of a few ways Buildkite could solve this in a better way:

  1. Allow to skip builds entirely from the post-checkout hook (without doing that dummy pipeline thing above). That'd be ideal, though I feel like it goes a bit against your architecture in a sense that by the time post-checkout hook is executed, build is considered started -- you can cancel it but cannot pretend it havent existed in the first place.

  2. Allow to enable/disable github status per step from the pipeline config (see https://github.com/buildkite/agent/issues/374 )

  3. Skip builds at the webhook level. Imagine having a webhook proxy that'd only pass webhook on to buildkite pipeline if particular PR affects this pipeline's project. That'd be the easiest option to use, however I think that'd require buildkite to have github permissions to access my code (which I'm personally totally fine with).

oavdeev avatar Jun 15 '18 19:06 oavdeev

Over here, we're investing in https://bazel.build for our build automation within our monorepo - that will become the thing that decides which work is necessary based on its knowledge of where changes are. Otherwise, I haven't thought of a safe way for a tool / product to do this, without an equivalent of the declared-dependencies that bazel BUILD files provide.

petemounce avatar Jun 18 '18 09:06 petemounce

That's a good point, generally I think you'd indeed want something like bazel for monorepo builds. I don't think Buildkite will ever be able to solve that entirely.

However, in a bunch of cases I've seen, monorepo subprojects are separate microservices, and they don't depend on other subprojects as libraries. In this case figuring out what to build can be done reliably by just looking at the list of changed files.

Most basic use case for me is basically: "run this linter script on any changes to this subdirectory (but don't spam github status section on unrelated pull requests)".

oavdeev avatar Jun 18 '18 22:06 oavdeev

Really great summary of places for improvement, and discussion. Thank you!

There’s also https://github.com/chronotc/monorepo-diff-buildkite-plugin by @chronotc if you haven’t seen it yet.

toolmantim avatar Jun 18 '18 23:06 toolmantim

These are great resources - I have a somewhat related question:

Is it possible to configure buildkite agent to not check out repositories multiple times (once for each pipeline) but only once? Goal is to save disk space. For large repositories it's not impossible to run out and those large fast SSDs are still on the expensive side.

sschaetz avatar Jul 07 '18 17:07 sschaetz

@sschaetz if you set the agent’s git-clone-flags option to include a --reference-if-able /some/dir then you can speed things up dramatically using a local checkout cache (which maybe you could pull every hour?). This flag can be changed from a plugin using the pre-checkout hook too.

There's also @sj26’s https://github.com/sj26/git-worktree-buildkite-hooks which is probably begging to be turned into a plugin, though it overrides the checkout hook which means it can get out of sync with features that are built into the bootstrap.

toolmantim avatar Jul 08 '18 09:07 toolmantim

Our solution is creating a triggering pipeline and many sub-pipelines for each subrepo. In the triggering pipeline, we set the only webhook and then run a script that generates a dynamic pipeline setting that including all the sub-pipelines needed to be triggered as steps based on a pipeline-directories mapping JSON. The benefits of doing it this way are that sub-pipelines won't have a lot of triggering build histories and the badges still working.

We use the following code to distingquish if a sub-pipeline should be triggered.

env git --no-pager diff --name-only "${last_build_revision}..${BUILDKITE_COMMIT}" | \
    grep -vE "${exclude_regex}" | \
    grep -E "${include_regex}" > /dev/null

And using the following code to generate the steps.

  if is-affected "${last_build_revision}" "${include_folders}" "${exclude_folders}" ; then
    echo "  - trigger: '${pipeline}'"
    echo "    label: ':rocket: Trigger ${pipeline}'"
    echo "    build: { commit: ${BUILDKITE_COMMIT}, branch: '${BUILDKITE_BRANCH}' }"

Our json looks like this:

{
  "buildkite-example": {
    "include": [
      "buildkite-example"
    ],
    "exclude": [
      "buildkite-example/excludes"
    ]
  }
}

SanCoder-Q avatar Oct 30 '18 22:10 SanCoder-Q

At @codercom, we have a dynamic pipeline that based on the directories changed, injects the appropriates steps into the pipeline to test/deploy them.

We're probably going to move to a solution like @SanCoder-Q though where the root pipeline generates triggers for all pipelines that need to run based on the changed directories.

nhooyr avatar Feb 13 '19 03:02 nhooyr

I was just speaking with @buddyspike at DevOps Day conference in Sydney about https://github.com/mbtproject/mbt. Might be a useful tool to look at for mono-repos.

There are lots of things in this issue that we'd like to build, but for now dynamically uploading trigger steps that execute specific pipelines for each module seems like a good approach.

I'd love to see the following building blocks:

  • buildkite-agent skip command that marked a build as skipped
  • A pre-accept hook in the agent that allows code to be run before the agent officially accepts it (perhaps skip is allowed here?)
  • Make the changed files information from the GitHub webhooks available to builds, OR make it easier to figure out what base commit (last successful build?) to use for a local git operation. Currently, this needs to use an API call.

lox avatar Oct 11 '19 03:10 lox

We currently use buildkite with a mobile monorepo where the setup looks like this:

  1. We have a single pipeline
  2. We use the webhook to upload our checked in config that only has one step to call run.sh
  3. If an environment variable isn't set, we run a python script that hits the github api and checks what builds need to be triggered
  4. From the python script we start N jobs with the API, setting up the environment variables specify what needs to happen
  5. These jobs go back through run.sh but hit a different path to actually run the build

This has a few small downsides:

  • GitHub statuses are best managed by each individual build since the per-job handling doesn't really fit this case
  • We end up creating N builds per PR, instead of 1 build with N jobs (I haven't checked if the latter is possible even)
  • If we wanted to have significantly different agent targeting rules I'm not sure if this is possible since right now we always use the same single step with the same script and same targeting

But overall this works great for us!

keith avatar Oct 11 '19 05:10 keith

Hey @SanCoder-Q, how are you handling last_build_revision? Are you saving this state somewhere in buildkite itself?

gtramontina avatar Jun 11 '20 07:06 gtramontina