backup-utils icon indicating copy to clipboard operation
backup-utils copied to clipboard

Repository Liberation

Open nugend opened this issue 10 years ago • 6 comments

It'd be nice if this tool supported a liberation mode wherein the entirety of the repos stored in the backups could be made into a collection of bare repositories organized by owner.

I realize that this would mean that the user may no longer be a customer, but I think it'd be a remarkable good-will gesture.

nugend avatar Sep 26 '14 21:09 nugend

Agreed. If you're running against a 11.10.34x release, the repositories/ directory under every snapshot is very nearly useable as you've described. Top-level directories under repositories/ are owner names and their repositories are stored as fully functional bare <repository-name>.git repositories. The one kink here is that the hooks directory under each repository is a symlink to a hard-coded location on the GitHub Enterprise appliance filesystem. Those would need to be adjusted but that could be done fairly easily.

Let me think about how best to document this under https://github.com/github/backup-utils#backup-snapshot-file-structure. I do think it's worth pointing out and is something we plan to retain moving forward.

rtomayko avatar Sep 26 '14 21:09 rtomayko

Neato!

nugend avatar Sep 26 '14 22:09 nugend

@rtomayko How does this change with the new repository format? Given that the repositories are now rather abstractly named and share files among forks.

xeago avatar May 10 '15 09:05 xeago

@Xeago Good catch. This is definitely more complicated with the changes in GHE 2.2 but still very possible. I'd like to provide some new tools with backup-utils to make this easier. My sense is there are two cases worth thinking about.

1. Grabbing a copy of a single repository

Useful when you need to pull a deleted repository from historical snapshot or when some kind of corruption has ruined an active copy.

This can be accomplished via a simple git clone against the backup snapshot's repository:

cd $GHE_DATA_DIR
git clone current/repositories/0/nw/01/23/45/678/678.git /path/to/copy.git

Using git clone instead of a simple cp is necessary because git object data is now stored in a separate network.git repository. Copying a repository directory directly will result in a repository with no git objects.

The hard part in this scenario is obtaining the path location of the repository in the snapshot since repositories are no longer named on disk in simple "user/repo.git" format. There is a "info/nwo" file in each repository directory with the "user/repo" name, which could be used for this purpose.

We've also been considering creating a hierarchy of symlinks on the instance to map "user/repo" names to their numeric repository directories. If we included this in the backup, locating repositories by name would be quite a bit easier.

2. Exporting all repositories in user/repo format

This is closer in spirit to the issue's original request. Given a hierarchy of repositories stored in the new filesystem layout, produce a "liberated" copy -- something you could throw into pretty much any git server environment.

This could be accomplished by applying the solution for 1) on all repositories present in the backup but I think it'd be worth considering some optimizations here because cloning each repository would both take a long time and require a large amount of disk space.

  • Copy repositories into the export location via cp -rl or rsync --link-dest, hardlinking files to save space and speed things up.
  • For alternated repositories, copy entire contents of shared object store (network.git/objects) to the repository's objects directory, also via hardlink. You can tell if a repository is alternated by the presence of a objects/info/alternates file.

With these optimizations, we should be able to reconstruct the GHE <= v2.1 repository backup structure on demand fairly quickly and without requiring 2x or more disk space, assuming the export is written to the same volume.

rtomayko avatar May 11 '15 13:05 rtomayko

Any news on this?

azzlack avatar Jun 22 '16 12:06 azzlack

@azzlack I'm sorry I missed this. It's now on my radar. There are no plans to work on this in the near future but I'll discuss that with the team.

/cc @github/backup-utils

rubiojr avatar Jun 22 '16 17:06 rubiojr