cabal
cabal copied to clipboard
source-repository-package generates fat clones
Fetching a source-repository-package seems to make a full clone of the repo:
$ cabal --version
cabal-install version 3.5.0.0
compiled using version 3.5.0.0 of the Cabal library
$ cat cabal.project
optional-packages: Agda
source-repository-package
type: git
location: https://github.com/agda/agda.git
tag: e12f391d2539b62a62d18aef74149a9a4695a871
package Agda
ghc-options: -fno-expose-all-unfoldings -fno-specialise-aggressively
$ cabal build all
... cloning ...
$ du -d3 -h
...
221M ./dist-newstyle/src/agda-eef06c3b3e56f437
...
Isn't there a more economic (and at the same time faster) way to clone a git repo if one only wants the version at a specific commit?
I think it was done like this to allow fast tag switching and global caching like cargo does (I don't think the latter was ever completed). As a workaround, if you already cloned that repo in some other path, you could set location to that path. The referenced clone will be treated as a remote and won't be modified.
I guess there could be an option to create a local shallow clone instead, but we'd have to think about how it'd interact with other repo types, submodules...
Relevant old discussion starting from this comment: https://github.com/haskell/cabal/issues/5586#issuecomment-665750729
1GB+ of Amazonka because I need a specific sha (which is handed to cabal)
--depth=1 works wonders at not cloning a 1.3GB repo fully
Is --depth=1 what @fgaz called a "shallow clone" above? Does the workaround work for you?
Is
--depth=1what @fgaz called a "shallow clone" above? Does the workaround work for you?
What is --depth=1? A cabal build option? I don't see it in the help.
If --depth=1 refers to the git clone option, how do you apply it to a git source-repository-package in a cabal file to make a 'workaround'?
Yes, that's a git option. I don't think it can be applied now, but @fgaz said "I guess there could be an option to create a local shallow clone instead" and we are asking whether that would suffice (also I'm asking whether "shallow clone" is the --depth=1 clone). If that's what users need, perhaps let's open a new ticket with that specific task and we'd signal that a PR implementing the ticket would likely be accepted.
@Mikolaj yes that would suffice. A shallow clone is a repository instance with a truncated history down to the specified --depth N entries, where N=1 is the smallest possible. Many CI pipelines use predefined explicit 1 < depth < 100 to allow for immediate (right after cloning) local branch checkouts while still optimising for bandwidth and time savings on large repository clones. Shallow clones obviously have a few downsides around history availability compared to regular full clones, but for the purpose of cabal --depth 1 or --depth 20 would work without issues. Besides, every shallow repository can later be programmatically converted into a full repository via either git pull --unshallow or git fetch --unshallow.
FYI: Nix had to implement support for the flag some time ago as well - https://github.com/NixOS/nix/issues/4455
Sounds good. What is the option called in Nix? Any other package managers or tools that do that and have good names? Do they take the depth parameter? Should we rather specify that in cabal.project or somewhere where the repo address is specified? Any other preliminary bikeshedding before we move for the main one to a new ticket?
Nix uses shallow = true to enable a hardcoded --depth 1 option, i.e. they don't allow to specify a custom depth. TravisCI allows for a custom depth config option. Note, however, that git allows shallow clones to be created via:
--depth=N--shallow-since=<date>--shallow-exclude=<revision>which is probably not very useful for the purpose of cabal'ssource-repository-package.
It needs a further discussion to decide whether one or all of the methods should be supported, but the important part here is that the depth should be aligned with the checkout tag/branch option of source-repository-package:
- https://cabal.readthedocs.io/en/3.4/cabal-package.html#pkg-field-source-repository-tag
- https://cabal.readthedocs.io/en/3.4/cabal-package.html#pkg-field-source-repository-branch
Subsequent tag changes and repository fetches between cabal v2-build calls should be handled gracefully as well.
I assume a new source repository property depth and/or shallow-since could be added to indicate the depth in this case. Let's say something like:
source-repository-package
type: git
location: https://github.com/ucsd-progsys/liquidhaskell
tag: b8dc0c2bdff8e6ea9ec4a9fc2439e89fdcd73b69
depth: 1
subdir:
liquid-base
liquid-prelude
liquid-ghc-prim
Alternatively, If cabal uses libgit internally (I haven't checked), it can try to utilise the same API call as Rust's Cargo here to perform shallow cloning implicitly via a new API option. As this is a relatively new option, git servers answering the call should support recent protocol versions for the option to work as expected.
I would simply add --depth 1 for all git cloning that cabal initiates (in all cases where this works). This should be the default. After all, you typically just want the read the repo contents for a specific commit, rather than having a clone with history and all that which you can use for blame etc. And, if needed, one can always manually unshallow.
Agree that --depth 1 should be the default.
I vote:
- Add a depth option to the cabal file
- Wait a release
- Set the default depth to 1
I vote:
1. Add a depth option to the cabal file 2. Wait a release 3. Set the default depth to 1
If there is a warning about that in the depth option description and possibly elsewhere, including the cabal manual, then IMHO this is a very civilized way of introducing the breaking change. In other words, I vote to either go full hog on preventive warnings (do we have a volunteer for that?) or make the change in one fell swoop, which we have the right to do in a major version with a proper changelog. Half-measures are a waste of effort IMHO.
Before we settle on depth: <natural number> we should have a field study what different VCS have, with the hope of finding an interface that not just git supports (an alt could be shallow: <boolean>).
Before we settle on depth:
we should have a field study what different VCS have
Sadly, most projects have switched to git. darcs has --lazy, I don't believe Mercurial has a similar thing (without using extensions). shallow seems to capture the idea and adding a "downloads a shallow clone, if possible" in the option documentation should be enough.
Data point for what Haskell devs use for versioning.
~~Mercurial seems to have shallow clone via the --root <rev> option: https://www.mercurial-scm.org/wiki/ShallowClone~~
Sorry, this was just a proposal; it seems Mercurial doesn't support it.
Before we settle on
depth: <natural number>we should have a field study what different VCS have, with the hope of finding an interface that not justgitsupports (an alt could beshallow: <boolean>).
I don't think you want to abstract this detail. For git give depth, darcs give lazy, etc.
I guess the current interface tries to abstract away the dvcs details though?
I don't think you want to abstract this detail. For git give depth, darcs give lazy, etc.
I suspect fat/shallow is abstractable (“give me just enough to build this project with”). Whether that is good UX I cannot say, as I have never used the feature!
@fgaz wrote
As a workaround, if you already cloned that repo in some other path, you could set
locationto that path. The referenced clone will be treated as a remote and won't be modified.
This still clones the whole thing. Even if it clones from a local source, it copies everything, swallowing disk space.