crater icon indicating copy to clipboard operation
crater copied to clipboard

Replace shelling out git in `src/git.rs` with usage of `git2` crate.

Open tomprince opened this issue 8 years ago • 9 comments

tomprince avatar Jun 04 '17 23:06 tomprince

Note that while this may be better in the long run, the code usually gets more complex and is sometimes harder to write/test.

Mark-Simulacrum avatar Jun 04 '17 23:06 Mark-Simulacrum

@Mark-Simulacrum, If we still want to go ahead and make this change, I can work on it and send a PR. I've already begun making changes locally using git2-rs, and it seems relatively straightforward. Please let me know your thoughts.

EDIT: Unfortunately, got stuck with the issue where libgit2 (See: https://github.com/libgit2/libgit2/issues/3058) - and thereby git2rs - does not support shallow clones yet. So this most definitely is not straightforward, sadly.

balajisivaraman avatar Mar 19 '18 13:03 balajisivaraman

Hm, I think shallow clones might actually be hurting us in the long run as we don't particularly care about disk space and we presumably end up making them deep in many cases anyway while re-pulling, though I'm not certain. It'd be good to figure out how much we're gaining in performance and/or disk space because of shallow clones.

Mark-Simulacrum avatar Mar 19 '18 18:03 Mark-Simulacrum

The original command used to clone effectively becomes irrelevant after the first run because all the git repos are cached so it just pulls new commits after that (which doesn't deepen by default). Shallow clones are a net benefit but aren't too much of a big deal.

aidanhs avatar Mar 28 '18 15:03 aidanhs

Uhm, there are some issues with swallow clones on GitHub making them more expensive than normal clones. Sure, we're not at that scale, but I think that's something worth considering.

pietroalbini avatar Mar 28 '18 17:03 pietroalbini

I'm aware of that issue, but there are many things different about it that I don't think it's very informative for us:

  1. in the two years since that comment was posted, github submitted a patch to git to improve fetch performance for shallow clones
  2. rather than one huge repo with a pathological directory structure that basically always has many new commits, our pattern is many tiny repos with mostly 0 new commits since the last run
  3. we have two machines which update the repos once every 4 days, rather than ~100,000 users updating all the time

Point 2 is probably the most notable - if there are no changes, there's nothing for github to do.

To be honest, I'd be :+1: on normal clones just because they're less exotic and less likely to cause surprises. Other than that its pretty much a wash - disk savings aren't too interesting when just doing a crater run takes about 1TB.

aidanhs avatar Mar 28 '18 21:03 aidanhs

Hmmm. Just following the conversation, I get the feeling that we're OK with switching to git2rs and using regular clones. If that is the case, I'd be interested in taking this issue to completion and closing it. 👍

balajisivaraman avatar Mar 29 '18 09:03 balajisivaraman

@balajisivaraman that sounds great - I'm certainly willing to review and merge if you make a PR :)

aidanhs avatar Mar 29 '18 10:03 aidanhs

@aidanhs, Thanks for the confirmation! I'll get started on it. :-)

balajisivaraman avatar Mar 29 '18 10:03 balajisivaraman