peru
peru copied to clipboard
When cloning a git repo, set depth to 1
Cloning a repo with git appears to clone the entire repo, rather than just the HEAD or specified revision. It'd be faster to just clone the specific version that's needed. Presumably this also applies for other VCS systems that support shallow clones.
We should definitely find something like this that works. The wrinkle is that when you're fetching a specific revision, we have to know how far back in history that revision is. A simple strategy might be something like "clone all repos at depth 1, but if we ever hit an error trying to find a ref, do a complete history fetch before giving up". That would be nice and fast for projects that don't specify a revision at all, but I worry it would be slower for the common case (or what I hope is the common case) where you do specify the revision. In that case you're almost guaranteed to need the history fetch, but now you'd at least be paying the cost of a second set of round trips.
Back when I looked into this last, I didn't see any git feature like "fetch only enough history to reach commit xyz," but if such a thing existed it would immediately solve this problem :)
You can clone with a specific version and depth of 1.
@shs96c are you sure? I've seen ways to make it work with a named branch or tag, but I've never seen it work with an arbitrary commit hash. @olson-sean-k do you have an example working with a commit hash?
@oconnor663 if it's enabled on the server side via the uploadpack.allowReachableSHA1InWant
option, then the following should work:
mkdir clone-dir
cd clone-dir
git init
git remote add origin http://example.com/exciting.git
git fetch --depth 1 origin <YOUR_HASH_HERE>
git checkout FETCH_HEAD
Without that option, you should still be able to grab tags and branches with depth 1.
Looks like GitHub doesn't enable that option? :(
error: Server does not allow request for unadvertised object cf0346161ccd3642defedeb4850a6c33406e56d6
gitlab can be configured to do it. On GH, I've checked that tags work as expected.
On Thu, Sep 28, 2017 at 4:11 PM, oconnor663 [email protected] wrote:
Looks like GitHub doesn't enable that option? :(
error: Server does not allow request for unadvertised object cf0346161ccd3642defedeb4850a6c33406e56d6
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/buildinspace/peru/issues/183#issuecomment-332867675, or mute the thread https://github.com/notifications/unsubscribe-auth/AABuRRyqgFmTSTKwwtq9PFK41oG5cezCks5sm7b7gaJpZM4PmeIQ .
I'm not opposed to implementing something that tries this and falls back to the current behavior if it fails. There are two reasons in my head it might not be worth the complexity:
- The Most Blessed Common Case is fetching a specific git commit by hash from GitHub. It's a shame not to be able to speed that one up. (There might be room for a heuristic like "fetch a depth of 1000 at first, since most projects in the wild fetch in that range", but I haven't done any sort of testing for it.)
- If you set
PERU_CACHE_DIR
to something like$HOME/.cache/peru
, all the repos peru clones behind the scenes will get saved there. With this setting, even a perfect--depth
value would only matter the very first time you synced a given repo on that machine. I've considered turning on this cache dir by default, since it's hard to discover.
If it's a PITA to implement, it's probably not worth the effort until monorepos catch on. :)
On Thu, Sep 28, 2017 at 4:24 PM, oconnor663 [email protected] wrote:
I'm not opposed to implementing something that tries this and falls back to the current behavior if it fails. There are two reasons in my head it might not be worth the complexity:
- The Most Blessed Common Case is fetching a specific git commit by hash from GitHub. It's a shame not to be able to speed that one up. (There might be room for a heuristic like "fetch a depth of 1000 at first, since most projects in the wild fetch in that range", but I haven't done any sort of testing for it.)
- If you set PERU_CACHE_DIR to something like $HOME/.cache/peru, all the repos peru clones behind the scenes will get saved there. With this setting, even a perfect --depth setting would only matter the very first time you synced a given repo on that machine.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/buildinspace/peru/issues/183#issuecomment-332871848, or mute the thread https://github.com/notifications/unsubscribe-auth/AABuRWzaPZCgzARJ9F0vy0mWx1yD5Hmeks5sm7oogaJpZM4PmeIQ .