cabal icon indicating copy to clipboard operation
cabal copied to clipboard

source-repository-package generates fat clones

Open andreasabel opened this issue 4 years ago • 21 comments

Fetching a source-repository-package seems to make a full clone of the repo:

$ cabal --version
cabal-install version 3.5.0.0
compiled using version 3.5.0.0 of the Cabal library 

$ cat cabal.project 
optional-packages: Agda

source-repository-package
  type: git
  location: https://github.com/agda/agda.git
  tag: e12f391d2539b62a62d18aef74149a9a4695a871

package Agda
  ghc-options: -fno-expose-all-unfoldings -fno-specialise-aggressively

$ cabal build all
... cloning ...

$ du -d3 -h
...
221M	./dist-newstyle/src/agda-eef06c3b3e56f437
...

Isn't there a more economic (and at the same time faster) way to clone a git repo if one only wants the version at a specific commit?

andreasabel avatar Jan 28 '21 18:01 andreasabel

I think it was done like this to allow fast tag switching and global caching like cargo does (I don't think the latter was ever completed). As a workaround, if you already cloned that repo in some other path, you could set location to that path. The referenced clone will be treated as a remote and won't be modified.

fgaz avatar Jan 28 '21 21:01 fgaz

I guess there could be an option to create a local shallow clone instead, but we'd have to think about how it'd interact with other repo types, submodules...

fgaz avatar Jan 28 '21 21:01 fgaz

Relevant old discussion starting from this comment: https://github.com/haskell/cabal/issues/5586#issuecomment-665750729

fgaz avatar Jan 28 '21 21:01 fgaz

image 1GB+ of Amazonka because I need a specific sha (which is handed to cabal)

--depth=1 works wonders at not cloning a 1.3GB repo fully

dysinger avatar Jun 09 '22 12:06 dysinger

Is --depth=1 what @fgaz called a "shallow clone" above? Does the workaround work for you?

Mikolaj avatar Jun 09 '22 13:06 Mikolaj

Is --depth=1 what @fgaz called a "shallow clone" above? Does the workaround work for you?

What is --depth=1? A cabal build option? I don't see it in the help.

If --depth=1 refers to the git clone option, how do you apply it to a git source-repository-package in a cabal file to make a 'workaround'?

jchia avatar Jul 19 '22 13:07 jchia

Yes, that's a git option. I don't think it can be applied now, but @fgaz said "I guess there could be an option to create a local shallow clone instead" and we are asking whether that would suffice (also I'm asking whether "shallow clone" is the --depth=1 clone). If that's what users need, perhaps let's open a new ticket with that specific task and we'd signal that a PR implementing the ticket would likely be accepted.

Mikolaj avatar Jul 19 '22 16:07 Mikolaj

@Mikolaj yes that would suffice. A shallow clone is a repository instance with a truncated history down to the specified --depth N entries, where N=1 is the smallest possible. Many CI pipelines use predefined explicit 1 < depth < 100 to allow for immediate (right after cloning) local branch checkouts while still optimising for bandwidth and time savings on large repository clones. Shallow clones obviously have a few downsides around history availability compared to regular full clones, but for the purpose of cabal --depth 1 or --depth 20 would work without issues. Besides, every shallow repository can later be programmatically converted into a full repository via either git pull --unshallow or git fetch --unshallow.

FYI: Nix had to implement support for the flag some time ago as well - https://github.com/NixOS/nix/issues/4455

avanov avatar Jul 19 '22 21:07 avanov

Sounds good. What is the option called in Nix? Any other package managers or tools that do that and have good names? Do they take the depth parameter? Should we rather specify that in cabal.project or somewhere where the repo address is specified? Any other preliminary bikeshedding before we move for the main one to a new ticket?

Mikolaj avatar Jul 19 '22 21:07 Mikolaj

Nix uses shallow = true to enable a hardcoded --depth 1 option, i.e. they don't allow to specify a custom depth. TravisCI allows for a custom depth config option. Note, however, that git allows shallow clones to be created via:

It needs a further discussion to decide whether one or all of the methods should be supported, but the important part here is that the depth should be aligned with the checkout tag/branch option of source-repository-package:

  • https://cabal.readthedocs.io/en/3.4/cabal-package.html#pkg-field-source-repository-tag
  • https://cabal.readthedocs.io/en/3.4/cabal-package.html#pkg-field-source-repository-branch

Subsequent tag changes and repository fetches between cabal v2-build calls should be handled gracefully as well.

I assume a new source repository property depth and/or shallow-since could be added to indicate the depth in this case. Let's say something like:

source-repository-package
  type: git
  location: https://github.com/ucsd-progsys/liquidhaskell
  tag: b8dc0c2bdff8e6ea9ec4a9fc2439e89fdcd73b69
  depth: 1
  subdir:
       liquid-base
       liquid-prelude
       liquid-ghc-prim

Alternatively, If cabal uses libgit internally (I haven't checked), it can try to utilise the same API call as Rust's Cargo here to perform shallow cloning implicitly via a new API option. As this is a relatively new option, git servers answering the call should support recent protocol versions for the option to work as expected.

avanov avatar Jul 19 '22 22:07 avanov

I would simply add --depth 1 for all git cloning that cabal initiates (in all cases where this works). This should be the default. After all, you typically just want the read the repo contents for a specific commit, rather than having a clone with history and all that which you can use for blame etc. And, if needed, one can always manually unshallow.

andreasabel avatar Jul 21 '22 13:07 andreasabel

Agree that --depth 1 should be the default.

ulysses4ever avatar Jul 21 '22 13:07 ulysses4ever

I vote:

  1. Add a depth option to the cabal file
  2. Wait a release
  3. Set the default depth to 1

ParetoOptimalDev avatar Jul 27 '22 03:07 ParetoOptimalDev

I vote:

1. Add a depth option to the cabal file

2. Wait a release

3. Set the default depth to 1

If there is a warning about that in the depth option description and possibly elsewhere, including the cabal manual, then IMHO this is a very civilized way of introducing the breaking change. In other words, I vote to either go full hog on preventive warnings (do we have a volunteer for that?) or make the change in one fell swoop, which we have the right to do in a major version with a proper changelog. Half-measures are a waste of effort IMHO.

Mikolaj avatar Jul 27 '22 10:07 Mikolaj

Before we settle on depth: <natural number> we should have a field study what different VCS have, with the hope of finding an interface that not just git supports (an alt could be shallow: <boolean>).

andreasabel avatar Jul 27 '22 14:07 andreasabel

Before we settle on depth: we should have a field study what different VCS have

Sadly, most projects have switched to git. darcs has --lazy, I don't believe Mercurial has a similar thing (without using extensions). shallow seems to capture the idea and adding a "downloads a shallow clone, if possible" in the option documentation should be enough.

Data point for what Haskell devs use for versioning.

ffaf1 avatar Jul 27 '22 15:07 ffaf1

~~Mercurial seems to have shallow clone via the --root <rev> option: https://www.mercurial-scm.org/wiki/ShallowClone~~ Sorry, this was just a proposal; it seems Mercurial doesn't support it.

andreasabel avatar Jul 27 '22 15:07 andreasabel

Before we settle on depth: <natural number> we should have a field study what different VCS have, with the hope of finding an interface that not just git supports (an alt could be shallow: <boolean>).

I don't think you want to abstract this detail. For git give depth, darcs give lazy, etc.

I guess the current interface tries to abstract away the dvcs details though?

ParetoOptimalDev avatar Jul 27 '22 18:07 ParetoOptimalDev

I don't think you want to abstract this detail. For git give depth, darcs give lazy, etc.

I suspect fat/shallow is abstractable (“give me just enough to build this project with”). Whether that is good UX I cannot say, as I have never used the feature!

ffaf1 avatar Jul 27 '22 18:07 ffaf1

@fgaz wrote

As a workaround, if you already cloned that repo in some other path, you could set location to that path. The referenced clone will be treated as a remote and won't be modified.

This still clones the whole thing. Even if it clones from a local source, it copies everything, swallowing disk space.

andreasabel avatar Jul 31 '23 16:07 andreasabel