soba icon indicating copy to clipboard operation
soba copied to clipboard

Compare bundle with latest commit to reduce backup time

Open jonhadfield opened this issue 3 years ago • 1 comments

To address question from @drummingdemon:

I was wondering whether it is somehow possible to fetch the latest commit from the previous bundle to speed up the consequent soba runs (this way the git clone and bundle steps might be spared) - but as far as I can tell, the only way would be to restore the bundle back to a repo and then compare the latest commits...?

It seems you can get the SHA of the commit HEAD is pointing to from a remote repo without first cloning:

$ git ls-remote https://github.com/jonhadfield/soba.git|grep HEAD
a18fd2809de118625dc41760d22dbe47c1ac74fa	HEAD

And you can do the same for a bundle without unbundling:

$ git bundle list-heads myrepo.20221004200732.bundle
57c17ee3178cca6f4b7ed1b0f52589610be6dc2c refs/heads/master
57c17ee3178cca6f4b7ed1b0f52589610be6dc2c HEAD

From there I guess I could compare the two SHAs and make a decision on whether or not to re-clone, bundle, and compare.
I guess it could be a popular use-case, as a lot of devs won't be re-basing etc.

Is that the sort of thing you mean?

jonhadfield avatar Oct 05 '22 17:10 jonhadfield

Thanks for moving this to a separate topic - yes, this is precisely what I was referring to!

I imagine some users would still feel 'safer' with the current way of operation, so this comparison mechanic could probably be toggled with an env var?

drummingdemon avatar Oct 05 '22 19:10 drummingdemon

I've just pushed a new release that's got a hidden feature (whilst testing). It works exactly as I put in the opening comment, and speeds things up a lot. If you get a chance to try it, simply add an environment variable: SOBA_DEV=true and let me know how you get on.
I'd love to get a GIT expert to call out any downsides to the approach, so will do some research and maybe put a post out on stackoverflow.
Btw, I've already noticed some spurious, but harmless, output that needs fixing.

jonhadfield avatar Oct 30 '22 17:10 jonhadfield

Seems to be working really well and yes, it has greatly sped things up (the complete run came down to 8 minutes) - also, as far as I can tell it didn't falsely skip any modified repos.

drummingdemon avatar Nov 01 '22 12:11 drummingdemon

Thanks for the feedback! I'll do some more of my own testing to check it works as hoped, and put the question out to the experts on any potential gotchas.

jonhadfield avatar Nov 01 '22 20:11 jonhadfield

So far, it looks like a positive move: https://stackoverflow.com/questions/74281792/git-comparing-local-bundle-with-remote-repository-using-refs-only. I've got some tidying up to do, and some tests to write, but will try and get a new beta out this week.

jonhadfield avatar Nov 02 '22 09:11 jonhadfield

Just a heads-up.
I've pushed a new beta that replaces the SOBA_DEV trigger with:

  • GITHUB_COMPARE
  • GITLAB_COMPARE
  • BITBUCKET_COMPARE

each supporting a value of either 'refs' or 'clone' where 'clone' is the default if not specified.
refs = check latest bundle refs (if existing bundle exists) against remote refs, without cloning
clone = clone the remote, bundle it, and compare latest bundle size and checksum with local (if existing bundle exists)

The reason for defaulting to clone is that someone's pointed out that using refs alone doesn't consider the remote could be shallow (older commits removed). I doubt it's a common situation, but don't want to make any assumptions when it comes to backups.

I'm leaving my own setup running for a few days to see if there are any real world issues I've not considered. Hope to get the final release out this week.

jonhadfield avatar Nov 06 '22 09:11 jonhadfield

Awesome, looking forward to it! :)

drummingdemon avatar Nov 06 '22 17:11 drummingdemon

Now released. Thanks.

jonhadfield avatar Nov 13 '22 08:11 jonhadfield