backup-utils
backup-utils copied to clipboard
Repository Liberation
It'd be nice if this tool supported a liberation mode wherein the entirety of the repos stored in the backups could be made into a collection of bare repositories organized by owner.
I realize that this would mean that the user may no longer be a customer, but I think it'd be a remarkable good-will gesture.
Agreed. If you're running against a 11.10.34x release, the repositories/
directory under every snapshot is very nearly useable as you've described. Top-level directories under repositories/
are owner names and their repositories are stored as fully functional bare <repository-name>.git
repositories. The one kink here is that the hooks
directory under each repository is a symlink to a hard-coded location on the GitHub Enterprise appliance filesystem. Those would need to be adjusted but that could be done fairly easily.
Let me think about how best to document this under https://github.com/github/backup-utils#backup-snapshot-file-structure. I do think it's worth pointing out and is something we plan to retain moving forward.
Neato!
@rtomayko How does this change with the new repository format? Given that the repositories are now rather abstractly named and share files among forks.
@Xeago Good catch. This is definitely more complicated with the changes in GHE 2.2 but still very possible. I'd like to provide some new tools with backup-utils to make this easier. My sense is there are two cases worth thinking about.
1. Grabbing a copy of a single repository
Useful when you need to pull a deleted repository from historical snapshot or when some kind of corruption has ruined an active copy.
This can be accomplished via a simple git clone
against the backup snapshot's repository:
cd $GHE_DATA_DIR
git clone current/repositories/0/nw/01/23/45/678/678.git /path/to/copy.git
Using git clone
instead of a simple cp
is necessary because git object data is now stored in a separate network.git
repository. Copying a repository directory directly will result in a repository with no git objects.
The hard part in this scenario is obtaining the path location of the repository in the snapshot since repositories are no longer named on disk in simple "user/repo.git" format. There is a "info/nwo" file in each repository directory with the "user/repo" name, which could be used for this purpose.
We've also been considering creating a hierarchy of symlinks on the instance to map "user/repo" names to their numeric repository directories. If we included this in the backup, locating repositories by name would be quite a bit easier.
2. Exporting all repositories in user/repo format
This is closer in spirit to the issue's original request. Given a hierarchy of repositories stored in the new filesystem layout, produce a "liberated" copy -- something you could throw into pretty much any git server environment.
This could be accomplished by applying the solution for 1) on all repositories present in the backup but I think it'd be worth considering some optimizations here because cloning each repository would both take a long time and require a large amount of disk space.
- Copy repositories into the export location via
cp -rl
orrsync --link-dest
, hardlinking files to save space and speed things up. - For alternated repositories, copy entire contents of shared object store (
network.git/objects
) to the repository'sobjects
directory, also via hardlink. You can tell if a repository is alternated by the presence of aobjects/info/alternates
file.
With these optimizations, we should be able to reconstruct the GHE <= v2.1 repository backup structure on demand fairly quickly and without requiring 2x or more disk space, assuming the export is written to the same volume.
Any news on this?
@azzlack I'm sorry I missed this. It's now on my radar. There are no plans to work on this in the near future but I'll discuss that with the team.
/cc @github/backup-utils