brew
brew copied to clipboard
Download large repos faster using sparse checkouts with partial clones
Provide a detailed description of the proposed feature
Provide an option to perform a partial clone with a sparse checkout to limit time it takes to fetch content from large git repositories.
Specifically, we can the following git commands to fetch and checkout a subset of a repository:
git clone --no-checkout --filter=blob:none https://github.com/google/fonts
cd fonts
git config core.sparseCheckout true
echo ofl/ibmplexmono > .git/info/sparse-checkout
git checkout HEAD
This has been discussed before, and one of the issues that came up was that the git sparse-checkout command is relatively new and marked as experimental. However, sparse checkouts still work with older versions of git, they're just a little less pleasant to use. I've verified the commands above work with git 2.20, which appears to be the version shipped with macOS 10.15 (which is the oldest version homebrew supports).
I'm happy to try putting together a PR for this if it'd be something you'd be interested in accepting, but I figured I'd start with an issue so we could discuss appetite and approach.
What is the motivation for the feature?
Some formulae download content from large git repositories. The example I ran into recently was a font in the homebrew-cask-fonts repo. On my machine, cloning the full repo (google/fonts) took 1 minute 39 seconds and used 3.5 GB of disk space:
$ time git clone https://github.com/google/fonts
Cloning into 'fonts'...
remote: Enumerating objects: 58690, done.
remote: Total 58690 (delta 0), reused 0 (delta 0), pack-reused 58690
Receiving objects: 100% (58690/58690), 1.44 GiB | 19.82 MiB/s, done.
Resolving deltas: 100% (31536/31536), done.
Checking out files: 100% (12128/12128), done.
real 1m39.989s
user 1m34.537s
sys 0m37.132s
$ du -sm fonts
3451 fonts
However, the files needed by the cask are less than 2 MB:
$ du -sm fonts/ofl/ibmplexmono
2 fonts/ofl/ibmplexmono
Currently, this cask works around this problem by using the SVN download strategy, which in turn uses GitHub's subversion proxy. These days, macOS doesn't come with subversion installed by default, so you get prompted to install svn via homebrew. That seems like an unnecessary dependency, and some people have run into issues installing subversion on recent versions of macOS. Additionally, a subversion proxy isn't a standard feature of git hosts.
Using a partial clone with a shallow checkout, we can get the same benefits (fetching just the subset of the repository that we need) but using vanilla git. On my machine this took 3 seconds and used 10 MB of disk space, which is a big improvement over fetching the full repository.
$ time bash fetch.sh
Cloning into 'fonts'...
remote: Enumerating objects: 24853, done.
remote: Counting objects: 100% (7/7), done.
remote: Compressing objects: 100% (7/7), done.
remote: Total 24853 (delta 0), reused 4 (delta 0), pack-reused 24846
Receiving objects: 100% (24853/24853), 5.38 MiB | 22.10 MiB/s, done.
Resolving deltas: 100% (15013/15013), done.
remote: Enumerating objects: 18, done.
remote: Counting objects: 100% (8/8), done.
remote: Compressing objects: 100% (8/8), done.
remote: Total 18 (delta 0), reused 0 (delta 0), pack-reused 10
Receiving objects: 100% (18/18), 732.11 KiB | 3.66 MiB/s, done.
remote: Enumerating objects: 1, done.
remote: Total 1 (delta 0), reused 0 (delta 0), pack-reused 1
Receiving objects: 100% (1/1), 72 bytes | 72.00 KiB/s, done.
Your branch is up to date with 'origin/main'.
real 0m3.073s
user 0m2.148s
sys 0m0.284s
$ du -sm fonts
10 fonts
$ ls fonts/ofl/ibmplexmono/
DESCRIPTION.en_us.html IBMPlexMono-Light.ttf IBMPlexMono-SemiBoldItalic.ttf
IBMPlexMono-Bold.ttf IBMPlexMono-LightItalic.ttf IBMPlexMono-Thin.ttf
IBMPlexMono-BoldItalic.ttf IBMPlexMono-Medium.ttf IBMPlexMono-ThinItalic.ttf
IBMPlexMono-ExtraLight.ttf IBMPlexMono-MediumItalic.ttf METADATA.pb
IBMPlexMono-ExtraLightItalic.ttf IBMPlexMono-Regular.ttf OFL.txt
IBMPlexMono-Italic.ttf IBMPlexMono-SemiBold.ttf upstream.yaml
This would let use maintain the benefits of the current SVN
How will the feature be relevant to at least 90% of Homebrew users?
It'll reduce the dependency on svn for fetching large repos. The fonts casks are probably the most notable example of using svn for this optimisation, so at the very least it should mean that all users who install fonts are no longer required to install svn.
What alternatives to the feature have been considered?
- Sticking with svn (downsides described in the motivation section)
- Fetching the full repo (downsides also described in the motivation section)
- Using shallow clones to speed things up further (these are expensive for GitHub, so I avoided mentioning them)
I'm happy to try putting together a PR for this if it'd be something you'd be interested in accepting, but I figured I'd start with an issue so we could discuss appetite and approach.
Yeh, for the scope of doing this to replace those that use a SVN download strategy: this makes sense to me!
macOS 10.15 (which is the oldest version homebrew supports).
Note that the download strategies should continue to work as far back as 10.10, though only system git back to 10.12 is actually used. That doesn't mean older versions need to have the benefits of sparse checkouts. They should just not be broken/error (can use Utils::Git.version checks where needed).
On Linux (not a concern for Cask, but is if we change the global download strategy), we support Git 2.7.0 and later, so that's actually a bigger range than macOS.
Thanks both! I'll pick this up when I get a sec. And thanks for the pointer on the macOS versions @Bo98 – I hadn't spotted that.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.
Easy there stalebot, I'll get the pull request finished soon!
A laudable effort!
I addressed the changes requested and opened a new PR as the old one got (rightfully!) closed out by stalebot.
Closed by https://github.com/Homebrew/brew/pull/14035