Add a shallow-clone option for git packages
At my company we are currently combining unity with flutter through https://pub.dev/packages/flutter_unity_widget and we host the unity widget dependency on a git repo. It gets big really quickly and we have to keep deleting tags.
I would like to submit a PR that allows a user to notify Pub to make a shallow clone of a Git repository instead of a mirror clone of the remote repository.
For example a user could use the following command:
dart pub add http --git-url=https://github.com/my/http.git --git-ref=tmpfixes --git-shallow-clone=true
or inside the pubspec.yaml file
dependencies:
vm_service:
git:
url: https://dart.googlesource.com/sdk
ref: refs/changes/80/156980/3
path: pkg/vm_service
shallow-clone: true
There is also the possibility of specifying the depth of the shallow clone like so, instead of passing a boolean flag:
dart pub add http --git-url=https://github.com/my/http.git --git-ref=tmpfixes --git-shallow-clone=1
or inside the pubspec.yaml file
dependencies:
vm_service:
git:
url: https://dart.googlesource.com/sdk
ref: refs/changes/80/156980/3
path: pkg/vm_service
shallow-clone: 1
There is a similar issue here: https://github.com/dart-lang/pub/issues/2686.
This is a private package right? And the issue is that it's large, thus, the git-dependency with full clone takes up a lot of space and bandwidth.
So is to correct that the possible solutions might be:
- Use a private package repository? (Probably, less preferable because you can't piggy back off the authentication you already have for git)
- git LFS (maybe?), if we tweaked pub to allow it?
- shallow git clones?
I'm curious, if you have multiple Dart SDKs installed. How shallow clones affect the PUB_CACHE and how will an old Dart SDK interact with it? (There is possible a solution, just saying we need to figure this out)
Also how do git shallow clones actually work? How shallow are they? What does the depth mean, and when is that sensible? Are they supported by all git versions, or will we need feature detection?
Should we migrate to only use shallow clones? Or is full clones still sensible in some scenarios.
Sorry, for the dumb questions, I'm not fully versed in all details of modern git. And anything that changes layout in PUB_CACHE requires care to ensure it works when users upgrade/downgrade SDKs.
This is a private package right? And the issue is that it's large, thus, the git-dependency with full clone takes up a lot of space and bandwidth.
So is to correct that the possible solutions might be:
- Use a private package repository? (Probably, less preferable because you can't piggy back off the authentication you already have for git)
- git LFS (maybe?), if we tweaked pub to allow it?
- shallow git clones?
I'm curious, if you have multiple Dart SDKs installed. How shallow clones affect the PUB_CACHE and how will an old Dart SDK interact with it? (There is possible a solution, just saying we need to figure this out)
Also how do git shallow clones actually work? How shallow are they? What does the depth mean, and when is that sensible? Are they supported by all git versions, or will we need feature detection?
Should we migrate to only use shallow clones? Or is full clones still sensible in some scenarios.
Sorry, for the dumb questions, I'm not fully versed in all details of modern git. And anything that changes layout in
PUB_CACHErequires care to ensure it works when users upgrade/downgrade SDKs.
Hi @jonasfj , I think most of your questions are valid.
I am not sure the following options you suggested below will resolve the issue because we will still need to pull a large history of our repository.
Use a private package repository? (Probably, less preferable because you can't piggy back off the authentication you already have for git) git LFS (maybe?), if we tweaked pub to allow it?
About this question:
I'm curious, if you have multiple Dart SDKs installed. How shallow clones affect the PUB_CACHE and how will an old Dart SDK interact with it? (There is possible a solution, just saying we need to figure this out)
The project using an older version of Dart will use the same version of the package cached in PUB_CACHE. The only difference between the mirror cloned and shallow cloned version is that the shallow cloned package will have a small history or commits than the mirror clone.
This how the depth option works:
--depth
Create a shallow clone with a history truncated to the specified number of commits. Implies --single-branch unless --no-single-branch is given to fetch the histories near the tips of all branches. If you want to clone submodules shallowly, also pass --shallow-submodules.
It basically allow us to pull a specific number of commit instead of fetching the entire git repository history.
Should we migrate to only use shallow clones? Or is full clones still sensible in some scenarios.
Basically the idea is to make a mirror clone when the shallow-clone option is not provided i.e we make mirror clone the default strategy for fetching git packages and only make a git shallow clone when the shallow-clone option is provided.
Basically the idea is to make a mirror clone when the shallow-clone option is not provided i.e we make mirror clone the default strategy for fetching git packages and only make a git shallow clone when the shallow-clone option is provided.
I get that, my question is if it's better to always make a shallow clone.
Use a private package repository?
Would certainly alleviate concerns about having a huge git history.
Basically the idea is to make a mirror clone when the shallow-clone option is not provided i.e we make mirror clone the default strategy for fetching git packages and only make a git shallow clone when the shallow-clone option is provided.
I get that, my question is if it's better to always make a shallow clone.
I am not sure if it is best to always make a shallow clone but I think it will be good to have an option to make a shallow clone when making a mirror clone becomes infeasible.
Any update on this?
Reading this: https://github.blog/2020-12-21-get-up-to-speed-with-partial-clone-and-shallow-clone/ made me think that partial blob-less clones or maybe even partial tree-less clones might work well for pub. That would save a lot of bandwidth, while working well with how eg. github is serving repos.
I guess there is still a lot of questions to answer before attempting this.
- Can we
git fetchin a tree-less fashion? - Will this interact well with existing pub caches with full checkouts
- Is this too breaking to do always (now you can no longer rely on the past history of your dependencies being available offline)
- Are there any other unintended side-effects?