agent icon indicating copy to clipboard operation
agent copied to clipboard

Support Shallow Clones

Open piotrb opened this issue 8 years ago • 10 comments

The main idea is that for larger repos a full fetch is not necessary and slower.

A few key points:

  1. in shallow mode, repositories should not be cached between builds (shallow clones are finicky and should be made fresh each time)
  2. the shallow clone should just attempt to fetch the specific sha1 the test needs

This should help not overload upstream providers with a lot of fetches and should speed up the initial fetch a lot.

I'm optimizing for the scenario where host machines are likely to have an entirely cold repo root cache and the download will take a significant amount of time. Especially in events when upsteam might be feeling a bit sluggish like during the github outage on Mar 14, 2017.

piotrb avatar Mar 14 '17 04:03 piotrb

@piotrb Was thinking of this use-case - Can't we already set "git-clone-flags" with "--depth=1" to the agent configuration?

https://buildkite.com/docs/agent/configuration

mallyvai avatar May 19 '17 21:05 mallyvai

No .. at least last time I checked .. while the parameter is passed just fine, the checked out working copy ends up being broken and makes the agent basically useless since it can't find refs properly on the shallow clone (its been a while so I'm a bit vague about the specifics)

piotrb avatar May 19 '17 22:05 piotrb

Thanks Piotr. This is a great (and common) optimization in build systems. Going to +1 this. At a bare minimum it should be documented IMO! :)

mallyvai avatar May 19 '17 22:05 mallyvai

I've avoided this because historically (pre git 1.9) shallow clones had lots of edge conditions. Perhaps we will look at this post v3.

lox avatar Nov 04 '17 01:11 lox

Hi! Just checking on this - I saw PR #957 getting merged to master, but no mention in the changelogs & wasn't sure if it has made it to a release, and if so, from which version up is it available? Thanks! <3

Walther avatar May 20 '19 23:05 Walther

Maybe one edge conditions of shallow clone and fetch is that it creates a lock file when fetching shallow.

With parallel agents running the git fetch on the same directory, I've had this error:

git fetch -v --prune --depth=1 origin refs/pull/5190/head
fatal: Unable to create '/var/lib/buildkite-agent/builds/buildkite-i-010121511371dd0c0-1/kimoby/kimoby-rails/.git/shallow.lock': File exists.

So git-clone-flags and git-fetch-flags allow shallow clone and fetch, but it does not seem compatible with parallel agents running the commands.

I don't know how to prevent this with the hosted solution. Any idea?

prathe avatar Apr 08 '20 18:04 prathe

Sorry, I pressed the wrong button!

sj26 avatar Oct 13 '20 23:10 sj26

I think shallow clones will work fine if you configure the agent with git-clone-flags="--depth=1". We have a clone lock now which avoids multiple agents on the same machine stomping on each other during cloning [if using a shared clone, like with mirrors]. And this ignores the potential shortcomings of a shallow-cloned repository when doing more than a simple clone and checkout.

If you want to make sure every job runs in a fresh clone, you could add a pre-exit hook which removes the build directory.

For a further optimisation, you could try doing single branch fetching too, or without the depth limit, by using an environment or pre-checkout hook like this — but YMMV!

export BUILDKITE_GIT_CLONE_FLAGS="--single-branch --branch=$BUILDKITE_BRANCH --depth=1`"

To my knowledge there is no way to clone a particular commit sha, so this is still going to be a bit brittle. The depth could be increased based on workflow.

Shallow fetching is a bit of a different beast. We've avoided that in favour of optimising for commit-only fetch [which is now fairly widely supported], or trying to fetch a specific branch or pull request head to achieve faster fetch with lower impact on code providers.

sj26 avatar Oct 13 '20 23:10 sj26

it seems github now supports uploadpack.allowReachableSHA1InWant=true so the single commit "no-clone" clone described here for git >= 2.5.0 now works:

git init
git remote add origin [email protected]:<USER>/<REPO>.git
git fetch --depth 1 origin <SHA>
git checkout FETCH_HEAD

might be nice to have this as an optional replacement to the current mechanism

fledman avatar Dec 04 '20 20:12 fledman

For checkouts of a commit with an existing local clone, we do try a commit fetch (sha1 in want) first: https://github.com/buildkite/agent/blob/4c557243c426c210ca1037faca1d407597a9dd6e/bootstrap/bootstrap.go#L1154

It respects git-fetch-flags so you supply --depth 1.

The trick is we tend to clone directly instead of doing a init/fetch. Perhaps we should consider splitting it so we can fetch the single commit for a fresh clone. But then we'd need an init flags or remote-add flags which allow setting the things you can do in a clone command. Hmm. 🤔

You could work around this at the moment by creating a pre-checkout agent hook which did an appropriate init of the repository, forcing the bootstrap to go down the fetch route instead of cloning?

sj26 avatar Dec 06 '20 21:12 sj26