I would like to contribute a "safe mode" for working with untrusted repos
For f-droid.org, we build thousands of Android apps from git repos. To reduce our attack surface and work towards "least authority", we use a custom Git wrapper that locks down a lot of things that we never need, and have a higher risk of vulnerabilities. I would like to rework this to be a part of GitPython. So I'm opening this issue to see if this is something that the GitPython maintainers would be interested in merging.
I'm open on the API, it could be something like this:
git_repo = git.repo.Repo('.', safe=True)
The goal would be then that all invocations of Git would include these kinds options:
core.askpass = /bin/true
core.hooksPath = /dev/null
core.sshCommand = /bin/true
credential.helper = /bin/true
http.emptyAuth = true
protocol.allow = never
protocol.https.allow = always
url.https://.insteadOf = ssh://
And run with these env vars:
GIT_TERMINAL_PROMPT=0
GIT_ASKPASS=/bin/true
SSH_ASKPASS=/bin/true
GIT_SSH=/bin/true # for git < 2.3
This then hopefully only allows unauthenticated access to HTTPS repos, and prevents the execution of any command besides git. This would eliminate risks like these:
- https://github.com/gitpython-developers/GitPython/security/advisories/GHSA-2mqj-m65w-jghx
- https://stackoverflow.com/questions/74200395/is-it-dangerous-to-open-or-clone-a-git-repository-from-an-untrusted-source
- https://github.com/git/git/security/advisories/GHSA-vm9j-46j9-qvq4
- https://git-scm.com/docs/git#_security
- https://github.blog/open-source/git/securing-git-addressing-5-new-vulnerabilities/
- https://nvd.nist.gov/vuln/detail/CVE-2017-1000117
This sounds like a very nice feature to have, and one that wouldn't affect anyone who didn't opt in to.
Yes, a PR would definitely be welcome.
Related question, is there a way to set the equivalent of GIT_TERMINAL_PROMPT=0 purely within gitpython (without setting an env var)?
GIT_TERMINAL_PROMPT=0 is functionally equivalent to core.askpass = /bin/true as far as I understand it. The env vars can override the config values, so I want to set the env vars to make sure that the config values are never overridden.
There is another thing that we use which is hard to generalize. The core goal is to rewrite all remote URLs to https://, whenever possible, then only support https as a protocol. This is necessary to support submodules when using this "safe" mode. ssh:// URLs are easy to handle (url.https://.insteadOf = ssh://), but the rest are not. This is the best I could come up with:
for domain in ('bitbucket.org', 'github.com', 'gitlab.com', 'codeberg.org'):
config.append(f'url.https://u:p@{domain}/.insteadOf=git@{domain}:')
config.append(f'url.https://u:p@{domain}.insteadOf=git://{domain}')
config.append(f'url.https://u:p@{domain}.insteadOf=https://{domain}')
Anyone know a way to generalize those insteadOf rules to any domain?
This seems to be the code handling the rewrites in Git - maybe from there it becomes clear how it can be used more generally?
Maybe it's the the code that parses the configuration that limits what it can do though.
I've started implementing this finally. The first thing I'm looking at is a way to read commit IDs (e.g. repo.head.commit.binsha) while being utterly certain that git is never executed. git.refs.symbolic.SymbolicReference.dereference_recursive() provides this already, with some extra work. Right now, in fdroidserver, we're using our own function based on dereference_recursive(). Maybe repo.head.commit.binsha already avoid executing git, but we need some kind of guarantee. Any tips on a better approach? Like is there any internal API that stops git from being executed?
It's very likely that using gitoxide would be preferred here. It has its own security model which makes the use of untrusted configuration impossible, and I'd be inclined to say that it can probably do what you'd need it to.
Probably that's not an option there though, and I suppose there can be other means to assure GitPython can't do things it shouldn't do with untrusted repositories. Testing this properly will certainly need some consideration as well.
gitoxide sounds great, but fdroidserver is in Python, and using pure Python makes a lot of distribution issues much easier, hence GitPython. I think GitPython will only need some small tweaks to do what we need it too.
#2029 implements the approach that I think works best. I'm open to suggestions from the GitPython experts.
This seems to be the code handling the rewrites in Git - maybe from there it becomes clear how it can be used more generally?
Maybe it's the the code that parses the configuration that limits what it can do though.
The tricky case is url.https://{domain}/.insteadOf=git@{domain}:, e.g. rewriting [email protected]:illarionov/OsmDroid.git to https://github.com/illarionov/OsmDroid.git. It has to replace both git@ with https:// and : with /, but the domain is in between. The insteadOf string replacement is like a startswith(), so the only way to replace that : is by including the domain name.
So there would need to be a way to parse the URLs first from .gitmodules to modify them. That seems complicated. Otherwise, just a few statically defined domain names would cover the vast majority of cases:
"-c",
"url.https://[email protected]:",
"-c",
"url.https://github.com/[email protected]:",
"-c",
"url.https://gitlab.com/[email protected]:",
"-c",
"url.https://codeberg.org/[email protected]:",
That sounds like a start. More importantly, I think the weaknesses of the safe parameter should be very well documented.
Here is my attempt at documenting the setup and known weaknesses: https://github.com/gitpython-developers/GitPython/pull/2029/files#diff-35a18a749eb4d6efad45e56e78a9554926be5526e2ba2159b44311e718450e88R957
- I don't really understand what
receive.procReceiveRefsanduploadpack.packObjectsHookdo. Is there anything I should add about them? - I understand what
remote.<name>.vcsdoes, but I don't know what kinds of risks it opens up. I did some quick tests and it seems thatremote.<name>.vcsis blocked byprotocol.allow=neverbut I don't know if that's guaranteed.
Looks like the only thing of concern would be a malicious URL that exploits git-remote-https. I wonder if it would be worthwhile to sanitize the URL first?
$ git config protocol.allow never
$ git config protocol.https.allow always
$ git remote add origin https://gitlab.com/eighthave/fdroid-verification-server.git
$ git remote update -p
remote: Enumerating objects: 256, done.
remote: Counting objects: 100% (191/191), done.
remote: Compressing objects: 100% (158/158), done.
remote: Total 256 (delta 113), reused 53 (delta 32), pack-reused 65 (from 1)
Receiving objects: 100% (256/256), 94.36 KiB | 399.00 KiB/s, done.
Resolving deltas: 100% (143/143), done.
From https://gitlab.com/eighthave/fdroid-verification-server
* [new branch] master -> origin/master
$ git fetch ssh://gitlab.com/eighthave/fdroid-verification-server.git
fatal: transport 'ssh' not allowed
$ git remote add ssh ssh://gitlab.com/eighthave/fdroid-verification-server.git
$ git remote update -p
Fetching origin
Fetching ssh
fatal: transport 'ssh' not allowed
error: could not fetch ssh
$ git config remote.ssh.vcs ssh
$ git remote update -p
Fetching origin
Fetching ssh
fatal: transport 'ssh' not allowed
error: could not fetch ssh
$ git config remote.ssh.vcs https
$ git remote update -p
Fetching origin
Fetching ssh
fatal: unable to access 'ssh://gitlab.com/eighthave/fdroid-verification-server.git/': Protocol "ssh" not supported or disabled in libcurl
$ git fetch https::[email protected]:eighthave/fdroid-verification-server.git --verbose
warning: url has no scheme: [email protected]:eighthave/fdroid-verification-server.git/
fatal: credential url cannot be parsed: [email protected]:eighthave/fdroid-verification-server.git/
$