peru icon indicating copy to clipboard operation
peru copied to clipboard

When cloning a git repo, set depth to 1

Open shs96c opened this issue 7 years ago • 8 comments

Cloning a repo with git appears to clone the entire repo, rather than just the HEAD or specified revision. It'd be faster to just clone the specific version that's needed. Presumably this also applies for other VCS systems that support shallow clones.

shs96c avatar Sep 27 '17 22:09 shs96c

We should definitely find something like this that works. The wrinkle is that when you're fetching a specific revision, we have to know how far back in history that revision is. A simple strategy might be something like "clone all repos at depth 1, but if we ever hit an error trying to find a ref, do a complete history fetch before giving up". That would be nice and fast for projects that don't specify a revision at all, but I worry it would be slower for the common case (or what I hope is the common case) where you do specify the revision. In that case you're almost guaranteed to need the history fetch, but now you'd at least be paying the cost of a second set of round trips.

Back when I looked into this last, I didn't see any git feature like "fetch only enough history to reach commit xyz," but if such a thing existed it would immediately solve this problem :)

oconnor663 avatar Sep 27 '17 22:09 oconnor663

You can clone with a specific version and depth of 1.

shs96c avatar Sep 27 '17 22:09 shs96c

@shs96c are you sure? I've seen ways to make it work with a named branch or tag, but I've never seen it work with an arbitrary commit hash. @olson-sean-k do you have an example working with a commit hash?

oconnor663 avatar Sep 28 '17 14:09 oconnor663

@oconnor663 if it's enabled on the server side via the uploadpack.allowReachableSHA1InWant option, then the following should work:

mkdir clone-dir
cd clone-dir
git init
git remote add origin http://example.com/exciting.git
git fetch --depth 1 origin <YOUR_HASH_HERE>
git checkout FETCH_HEAD

Without that option, you should still be able to grab tags and branches with depth 1.

shs96c avatar Sep 28 '17 15:09 shs96c

Looks like GitHub doesn't enable that option? :(

error: Server does not allow request for unadvertised object cf0346161ccd3642defedeb4850a6c33406e56d6

oconnor663 avatar Sep 28 '17 15:09 oconnor663

gitlab can be configured to do it. On GH, I've checked that tags work as expected.

On Thu, Sep 28, 2017 at 4:11 PM, oconnor663 [email protected] wrote:

Looks like GitHub doesn't enable that option? :(

error: Server does not allow request for unadvertised object cf0346161ccd3642defedeb4850a6c33406e56d6

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/buildinspace/peru/issues/183#issuecomment-332867675, or mute the thread https://github.com/notifications/unsubscribe-auth/AABuRRyqgFmTSTKwwtq9PFK41oG5cezCks5sm7b7gaJpZM4PmeIQ .

shs96c avatar Sep 28 '17 15:09 shs96c

I'm not opposed to implementing something that tries this and falls back to the current behavior if it fails. There are two reasons in my head it might not be worth the complexity:

  • The Most Blessed Common Case is fetching a specific git commit by hash from GitHub. It's a shame not to be able to speed that one up. (There might be room for a heuristic like "fetch a depth of 1000 at first, since most projects in the wild fetch in that range", but I haven't done any sort of testing for it.)
  • If you set PERU_CACHE_DIR to something like $HOME/.cache/peru, all the repos peru clones behind the scenes will get saved there. With this setting, even a perfect --depth value would only matter the very first time you synced a given repo on that machine. I've considered turning on this cache dir by default, since it's hard to discover.

oconnor663 avatar Sep 28 '17 15:09 oconnor663

If it's a PITA to implement, it's probably not worth the effort until monorepos catch on. :)

On Thu, Sep 28, 2017 at 4:24 PM, oconnor663 [email protected] wrote:

I'm not opposed to implementing something that tries this and falls back to the current behavior if it fails. There are two reasons in my head it might not be worth the complexity:

  • The Most Blessed Common Case is fetching a specific git commit by hash from GitHub. It's a shame not to be able to speed that one up. (There might be room for a heuristic like "fetch a depth of 1000 at first, since most projects in the wild fetch in that range", but I haven't done any sort of testing for it.)
  • If you set PERU_CACHE_DIR to something like $HOME/.cache/peru, all the repos peru clones behind the scenes will get saved there. With this setting, even a perfect --depth setting would only matter the very first time you synced a given repo on that machine.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/buildinspace/peru/issues/183#issuecomment-332871848, or mute the thread https://github.com/notifications/unsubscribe-auth/AABuRWzaPZCgzARJ9F0vy0mWx1yD5Hmeks5sm7oogaJpZM4PmeIQ .

shs96c avatar Sep 28 '17 15:09 shs96c