git-signatures icon indicating copy to clipboard operation
git-signatures copied to clipboard

Fix three zero days

Open oxij opened this issue 5 years ago • 45 comments

... and I think I know about the fourth one, similar to the third, but I'm too tired for now already.

First 5 commits are more or less cleanup. Then there are 3 pairs with each pair first adding a failing test that should have worked and then the next commit fixing git-signatures so that it would actually work.

I also strongly suggest:

  • rewriting this for bash -e, I frequently find errors this way,
  • using four spaces for indent :)

oxij avatar Oct 02 '18 05:10 oxij

Looks solid at first glance. Will look over in more detail tomorrow.

Also:

using four spaces for indent :)

Tab indentation is actually "standard" in bash because of heredocs. Indented heredocs are only valid with leading tabs, but not leading spaces. Switching between tabs and spaces mid code is just a mess, so I just resolve that bash is like Go and mandates tabs. In python I use 4 spaces though because PEP8 ;)

rewriting this for bash -e, I frequently find errors this way,

I would welcome a PR for that after we get this and the patch-id PR merged

lrvick avatar Oct 02 '18 08:10 lrvick

I got a bit tired regenerating keys by hand while continuing this branch. Added 0f816f83b9adb2049ccf3e274ae9b59883eb2bea in the middle of history here and changed later commits to respect it.

rewriting this for bash -e, I frequently find errors this way, I would welcome a PR for that after we get this and the patch-id PR merged

Ok.

using four spaces for indent :) Tab indentation is actually "standard" in bash because of heredocs. Indented heredocs are only valid with leading tabs, but not leading spaces. Switching between tabs and spaces mid code is just a mess, so I just resolve that bash is like Go and mandates tabs. In python I use 4 spaces though because PEP8 ;)

I'd prefer usage with heredocs being unindented than tabs everywhere, tbh.

oxij avatar Oct 02 '18 21:10 oxij

Added another test that shows that my fourth zero day didn't work out.

Give this another 24h before merging just in case but I think this is now LGTM.

oxij avatar Oct 02 '18 22:10 oxij

Refactored tests, rebased to make the order beautiful, added two more failing tests and a fix.

Judging by GPG's docs there are two more non-tested cases: BADSIG and EXPSIG (EXPKEYSIG is tested now).

EXPSIG I'll manage later, but I tried and failed making it to produce BADSIG status no matter what I do. (I tried to break a signature in a PGP packet, maybe trying to make a fake key would be easier, but I have no idea how to generate those cheaply enough for tests).

oxij avatar Oct 03 '18 18:10 oxij

@oxij So a lot of work was going on in the sign-patch-id branch which just merged. Most of it should not touch your work, but can you rebase?

lrvick avatar Oct 03 '18 21:10 lrvick

Also most of us involved in the project are on irc.hashbang.sh/#! 6697 if you want to keep in sync at any point.

lrvick avatar Oct 03 '18 21:10 lrvick

I got a bit tired regenerating keys by hand while continuing this branch. Added 0f816f8 in the middle of history here and changed later commits to respect it.

I actually went this route initially, but realized it makes most CI systems hang until timeout as they become entropy starved. If you want to go this route that is totally cool, but you may have to update the CI jobs to setup haveged or something to keep the entropy pool full, or stub in something to spam to /dev/random to fill the pool before generating keys.

lrvick avatar Oct 03 '18 21:10 lrvick

What is the reasoning for breaking out test to have its own makefile?

Tradition, simplicity, I guess. I made the tests directory mostly self-sufficient and it became easier to add a Makefile there than cd tests everywhere else.

CI and randomness

Yeah, I'll add writes to /dev/random after I hit the issue, works for me now.

So a lot of work was going on in the sign-patch-id branch which just merged. Most of it should not touch your work, but can you rebase?

I understand that this now got awkward, and all, but I read the changes and I disagree with that direction.

I just fixed a bunch of security issues here (the replay and "silly names" tests were the funniest) and that branch introduced yet another one by allowing out-of-order application of patches =/. As I pointed out (albeit in question form) in NixOS rfcs thread, IMO patch-id signing should, if at all, be implemented on top of revhash signing in opt-in manner (that is, revhash signature should permit reapplying it to a patch with the same patch-id), not vice versa. I want revhash-based semantics 99.999% of the time and I don't see any elegant solutions for going from patch-id-based system to revhash based system. Hence, I disagree.

I also don't like the "JSON" direction you outlined in NixOS thread. I think that JSON is cancer and that signed message format should be fixed in stone, and changes to it should be versioned on per-notes-branch basis, not per-revhash basis. Again, for security reasons.

I'm sorry, but I'm very opinionated on such topics and I can't make myself see any point in rebasing this on top of current master now. I'm going to continue by myself, let the community judge the results, I guess.

However, feel free to take anything you like from here, and, later, from the rest of my fork.

IRC

Believe me, you don't want me on your IRC. :)

oxij avatar Oct 03 '18 23:10 oxij

Believe me, you don't want me on your IRC. :)

We've seen worse. Feel free to join us: ircs://irc.hashbang.sh:6697/#!,#!social - TLS required.

RyanSquared avatar Oct 04 '18 01:10 RyanSquared

@oxij I don't think forking is required, or that JSON is required. If there is another simple to parse key/value format you have in mind, let's hear it. I would like to see an alternative suggestion rather than just declaring the current suggested format cancer :)

I also agree that we need revhash on top of patch-id. I was going to address that in my next revision, but wanted to get patch-id in first now that we proves that out as a standalone solution. It also has very clear security issues (can merge things out of order) thus me wanting to get your changes in that don't pertain to the actual signature format, then we can address that in a follow up PR where we have a key value map that we sign that contains the master head ref, the patch-id, and the current ref, with room to easily add more future key/value pairs for some future use cases that are out of scope atm.

I really value your input here. If you can elaborate on a specific attack, then lets solve for it. Otherwise we are going to end up with 2 basically identical projects with slightly different signature formats, which is still.

lrvick avatar Oct 04 '18 01:10 lrvick

To elaborate further, lets say we sign and base64 key/value map that contains:

  • patch_id: patch-id
  • head_hash: HEAD hash of this branch
  • target_hash: HEAD hash of the branch we want to merge on top of

This map could be msgpack, json, some tupl thing. I don't actually care that much as long as it is easy to parse in a wide range of programming languages.

Wiith that assumption we end up with multiple possible verification modes with different tradeoffs, and I think we can easily support all of them in future git signatures verify flags if the above set of 3 items are signed all the time.

Commit Ref

  • Verify every commit ref is signed by m-of-n signers

Pros

  • Most simple/secure and where we started

Cons

  • Rebases invalidate signatures
  • Merge commits invalidate signatures

Patch-id

  • Verify the patch-id for a given set of changes is signed

Pros

  • Signatures can survive a rebase/squash/merge as long as no LOC changes happen

Cons

  • security issue: patches could be merged out of order or summoned from the past

Patch-id + master HEAD

  • Verify the patch-id is signed -and- that it was merged on the designated master HEAD

Pros

  • Signatures can survive a rebase/squash/merge as long as no LOC changes happen

Cons

  • Only one set of changes can be signed at a time. Once one merges the others are invalidated.

Hybrid Verification

  • Reviewers sign the lastest LOC change
  • Reviewer signatures are only verified in patch-id mode
  • CI system signatures are created certifying tests passed etc
  • CI system only has -read- access to repo
  • CI automatically re-signs afer all cahnges if tests pass etc
  • CI automatically re-signs if the master HEAD ref changes
  • CI signatures are verified in patch-id + master head mode

Pros

  • Signatures can survive a rebase/squash/merge as long as no LOC changes happen
  • Redordering attack only possible if CI signing key -and- write access to repo are obtained

Cons

  • More complex to implement and setup initially

lrvick avatar Oct 04 '18 02:10 lrvick

@oxij Just in case because I was the one who pointed out JSON, using the JSON format for serialization doesn't mean the format cannot be fixed in stone. It just means we can (/must) use a JSON parser for handling it, in exchange for a few bytes of storage (basically, the {} and "", which are the only superfluous characters here). Actually, I think base64 storage would lose much more storage space than JSON, so even the storage argument isn't that noticeable (but if comparing with eg. key1:value1;key2:value2 we could indeed optimize for space)

So it's mostly a “do we want to be able to use a JSON library in exchange for being more or less forced to use a JSON library?” question. I personally think that if deciding that the format must be an object with only string values without escapes and no superfluous spaces, then it's not harder to parse than git objects: if I can allow myself to mix syntaxes for hopeful clarity, it gives this parser:

(\x -> x[1..-1]) |> split ',' |> map (split ':' |> map (\x -> x[1..-1]))

So to sum up, my comparison between base64-ed git objects (hereafter b64go) vs. restricted JSON (hereafter rjson) formats yields:

  • b64go needs a library for sane base64 decoding, rjson can be decoded with the same amount of processing than after base64-decoding of b64go without any library (well, yeah, with adding the two \x -> x[1..-1], I deem this cost negligible though)
  • rjson can be (encoded, decoded) easily with a library, b64go cannot (still require parsing git objects by hand)
  • rjson takes less storage space than b64go

All in all, I don't see a single reason to prefer b64go over rjson, apart from the fact that I agree it's very annoying that people use JSON everywhere without thinking of it. Do I miss a point? :)

(sorry, I managed to restrict myself on the RFC thread, but it looks like it actually got out here… well, at least here it's somehow on-topic :))

Ekleog avatar Oct 04 '18 02:10 Ekleog

@lrvick Patch-id, in my opinion, brings much more complexity (and thus occasions for vulnerabilities) for almost no benefit. From your list:

  • The commit-ref is already possible without it
  • The patch-id is flawed as you yourself mention, so unusable for any security purpose, and I can't see any point that's not security-related
  • The patch-id + master HEAD doesn't bring anything over commit-ref : if you just consider people will sign after merging (and not before), then all that is handled by patch-id + master HEAD is also handled by commit-ref
  • The hybrid verification system is the only case where there is a marginal benefit from using patch-id + master HEAD

I personally think that supporting a highly counter-intuitive signature scheme is not worth the minor benefit from hybrid verification systems, as anyway it's not more secure than commit-ref (actually, it's less) and the convenience advantages are limited (anyway, a committer will likely merge around the same time as they sign, and multiple-signature schemes need to be handled at the fetching end anyway so there's no TOCTOU)

Ekleog avatar Oct 04 '18 02:10 Ekleog

Actually, an additional concern about hybrid signing mode: it can be used by a maintainer to put the blame on another, by using TOCTOU.

Example:

  • Maintainer A signs patch A for master, that adds a free
  • Maintainer B notices and sees patch A isn't merged yet. Maintainer B signs patch B for master, prepones signature to earlier than patch A, and patch B adds another free at a place that doesn't conflict
  • Maintainer B tricks CI into accepting their patch before maintainer A's
  • Now there's a double-free and maintainer A's patch introduced it

So even it is not secure, so I cannot see a single advantage in using patch-id signing.

Ekleog avatar Oct 04 '18 02:10 Ekleog

@lrvick @Ekleog

JSON

I'm not like super-biased against JSON, I am a bit biased. But my thinking here goes like this: when in Rome do like Romans. git objects are " \0" header followed by "[ ]*" fields. I see no reason to use something different here (except " \0" is not needed as the first is obvious and the second is provided by the PGP packet), parsing those in bash is easy enough.

@lrvick

patch-id

@Ekleog pretty much wrote my thoughts on the subject.

  • Counter-intuitive for git users.

  • Simple stuff gets complicated. Your scheme with two more hashes feels like you made git-signatures for darcs VCS and now trying to adapt it to git. Ugly.

  • Computation of patch-ids themselves is complicated, merged PR itself changed diff-tree options several times.

  • I see no good uses for them. Signed patch-id literally has the following semantics: "I sign any repository state that can be made with this applied". I.e. an infinite number of repository states. This is ridiculous for any sufficiently large code base.

  • It interacts badly with lots of other stuff. E.g., I can make a commit with renames, pick a patch-id-signed commit from any other place in history, revert the renames. I just applied your change with your signature to a random similar-enough file. Fill in a bunch of out-of-context commits in between and you got a nice way to introduce backdoors, all you need to do is find a series of patches in history that can be chained to make a change you want and make maintainers sign some trivial renames. In a long enough history like nixpkgs you could then find a series of changes for anything. I'm sure there are more examples like this.

If you need that feature, it's much simpler to just make a separate signature type for it. But I don't think I would ever need it. Backports need to be checked and signed independently anyway.

oxij avatar Oct 04 '18 03:10 oxij

@oxij we had a chat on IRC: I think @ekleog and I can agree that at least a minimum that we sign the tree object. I think @lrvick is now on board with this. i.e. signing patch ids is no longer to be a thing.

Whether the parents should be signed is what myself and @Ekleog disagree on: we could make that configurable.

daurnimator avatar Oct 04 '18 03:10 daurnimator

I pushed the best complete working state I have to this branch (I managed to produce a BADSIG, yay!), now I'll have to take a break from full-time hacking till Sunday, I got some work to do. Do review, do not merge before we got a clear consensus.

Whether the parents should be signed is what myself and @Ekleog disagree on: we could make that configurable.

Parents? Signed together with what? Please elaborate.

oxij avatar Oct 04 '18 03:10 oxij

@oxij I'm in favor of signing a commit hash, @daurnimator is in favor of signing a tree hash, the difference being mostly whether the parent hash is signed.

My argument in favor of signing the complete commit hash is that I don't want replay attacks to be possible (otherwise a revert to an old system state would be automatically signed by the person who signed the old system state). I think @daurnimator's solution for this issue is timestamping signatures, which I find very fragile.

Ekleog avatar Oct 04 '18 04:10 Ekleog

@Ekleog I see. I'm also in favor of signing the commit hash, obviously.

IMHO timestamps should be rounded, if not set to 0. I don't want to leak timestamps of all my reviews to the whole world, I review stuff offline and timestamps are privacy-sensitive. (For commits I usually do git commit --amend --date=now before publishing.)

Hence I plan to add git commit-like thing here too (will have to reimplement some of git-notes myself for that, thought).

oxij avatar Oct 04 '18 04:10 oxij

Updates from IRC discussion:

patch-id has too many problems and complexity that can't be easily resolved, and we can get most of the wins we want with tree_id. I have been convinced to drop it in favor of tree_id which can address these use cases with a flag.

I will be making a follow up PR to make signatures be base64(gpgsign({"commit_id":"...",tree_id:"...","date":"..."}))

If you have a strong argument against json for the key/value bit in the middle I will totally hear it out.

From there I think that should very well line up with your changes in this PR and keep the same security assumptions.

lrvick avatar Oct 04 '18 04:10 lrvick

@lrvick

I will be making a follow up PR to make signatures be base64(gpgsign({"commit_id":"...",tree_id:"...","date":"..."}))

Please document you reasoning for these decisions here publicly, not in IRC logs. Why tree-id and date fields?

I wouldn't like someone rewriting my commit messages and reusing my signatures without my approval.

I think for things tree-id and patch-id could potentially be useful for there needs to be a separate command, with workflow similar to git rerere. I.e. "resign everything in $this branch if it has the same patch-ids/tree-id I already signed before in $that branch".

If you have a strong argument against json for the key/value bit in the middle I will totally hear it out.

Like I said, git Romans use a simpler format, I see no reason to use any other format, I would agree to JSON if we had to store hierarchical data, but I doubt we would need anything but lists of space-separated strings.

$ git cat-file -t 419ea86540a0e5e947fd523ae8004790d5cfda55
commit
$ git cat-file -p 419ea86540a0e5e947fd523ae8004790d5cfda55
tree 2c6620e30eec2d739a622a9ccc884ba07e84ca1a
parent be68971cd18226a15d4b3d784273e4d9c96872a2
author Jan Malakhovski <[email protected]> 1538493463 +0000
committer Jan Malakhovski <[email protected]> 1538526107 +0000
gpgsig -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCgAdFiEEUUu5ZrRuNWUFCIboDmymblxVeqgFAlu0C5wACgkQDmymblxV
 eqhJNwgAgX0krEm5mEQGj8wViowO4d95dpAcn9FXsk9Jnz6uNCMDvLXkKHC2kEeS
 mBDfoOr47AdpO7QGMwnIBXsbG4vcN5OcjQgPDReUdd8Y81vxNfwGplMPDonwRS7r
 ivdvLTZapSHCFMc3plWbBQXnee6TOOr/g1M/OF6RnCOUx+GnU0dTUjaw4dQCY7LH
 xsWWh4ofXyM7fH7JEsGSVFAOwF21DFVTcBxO2x3zE/Q19NeaoqHTqAe/Tto85dL/
 QlGHkaiXD7vnsOnTOwc5kNWjsiWC+KH4u4fBlEYpng7ovBMU0REg6ha+S/jeLVZv
 M+gN3LDTDF+FzPzBR+ZNKOJDOvuSgA==
 =n7lq
 -----END PGP SIGNATURE-----

maintainers/maintainer-list.nix: add my own PGP/GPG attributes

First! :)

If you read the source of git/builtin/cat-file.c you'll see that -p for "commit" objects just prints out their content verbatim. I.e. the above is the verbatim format git uses for commit objects. I'm totally okay with that format. I kinda like how RFC822-ish it is.

On a similar note, I think it would be good if this project was at least dual-licensed as GPLv2 (see what git Romans do). But I guess in the worst case I can just relicense after I rewrite everything, which, ATM I feel I'll end up doing anyway.

oxij avatar Oct 04 '18 11:10 oxij

@oxij I didn't read the source of cat-file.c, but am pretty sure you're wrong on this one. Experiment:

$ cd /tmp
$ git init test
Initialized empty Git repository in /tmp/test/.git/
$ cd test
$ touch test && git add . && git commit -m "test"
 [master (root-commit) 9400242] test
 1 file changed, 0 insertions(+), 0 deletions(-)
 create mode 100644 test
$ git cat-file -t 9400242
commit
$ git cat-file -p 9400242
tree f05af273ba36fe5176e5eaab349661a56b3d27a0
author Léo Gaspard <[email protected]> 1538659024 +0900
committer Léo Gaspard <[email protected]> 1538659024 +0900

test
$ xxd .git/objects/94/00242*
00000000: 7801 95cd 410a 0231 0c40 51d7 3d45 f682  x...A..1.@Q.=E..
00000010: a4cd 34b5 20e2 ce8d 9748 6752 1d70 a874  ..4. ....HgR.p.t
00000020: e2a1 3c87 1713 f404 2eff e2f1 c7b6 2cb3  ..<...........,.
00000030: 8167 da58 5785 8a51 6a48 5484 b86a f489  .g.XW..QjHT..j..
00000040: 35aa 48a1 2133 7b89 5c68 0a49 d0c9 d36e  5.H.!3{.\h.I...n
00000050: adc3 e5fd 6a70 96f5 217d 82c3 5ddb e9fa  ....jp..!}..]...
00000060: 8bdd dc8e e023 ed39 660c 036c 3123 baf1  .....#.9f..l1#..
00000070: bb33 fd1b 3ad3 d5dc 0781 ab35 58         .3..:......5X

This is clearly not just outputting the file, and I guess what we see here is the layer of deflate I mentioned somewhere.

Ekleog avatar Oct 04 '18 13:10 Ekleog

@oxij @lrvick

Suggestion: make at least the date argument optional, potentially tree-id too (as we'd going the route of optional fields anyway).

Reasoning: For date, people may not want to leak the date at which they review (cf. https://github.com/hashbang/git-signatures/pull/13#issuecomment-426880406 for at least one practical example, I'm totally sure there are others). So if the field is not optional, these people will set it to timestamp 0, which will likely break the world. At least by making the field optional, tools will have to expect its absence. For tree-id, well, optional-ness is not required, but optional-ness is required for date anyway, so might as well make it easier for people who'd want to implement an alternative compatible version of the tool without this debatable feature.

Now, interoperability between the two projects (and potentially a third by me if you two both do it in bash without libgit2/gpgme :p) would still require at least agreement on the data format (json vs git-like). I still have a slight preference for the restricted-json format just so that it's possible to read/write it in most languages without having to write a parser (even though it'd be easy to write one for it), but it's true we're most likely never going to need hierarchical data, so…

Ekleog avatar Oct 04 '18 13:10 Ekleog

Now, misc. potentially off-topic notes:

@oxij, GPLv2 potentially has a legal issue that arose with the recent troll about contributors withdrawing their contributions from the Linux kernel following the addition of a CoC: contributors do not explicitly engage themselves to actually continue giving the license, so it may (or may not) be the case that they could indeed rescind the right to use their code by notifying anyone they want. This is the topic of the following paragraph added to GPLv3:

  All rights granted under this License are granted for the term of
copyright on the Program, and are irrevocable provided the stated
conditions are met.  This License explicitly affirms your unlimited
permission to run the unmodified Program.  […]

This issue also applies to MIT, though. This is the reason why most of the Rust ecosystem dual-licenses in MIT/Apache2, as Apache2 has a similar provision of irrevocability. Disclaimer: I am not a lawyer, this is not legal advice.

tl;dr: I don't see any reason for using GPLv2, even if Romans do so. After all Romans used abacus as computers, and thankfully we no longer do.


@lrvick You forgot the part about review result and review comment, at least.


@oxij Actually, thinking about it again, we would likely want multi-line review comments. So that would require either allowing escaping in the JSON, or having a format really similar to git-commit's inside-deflate format. I'd still be vaguely in favor of escaping in the JSON, but suddenly it becomes much less clear, because it actually requires a full-blown (or almost) JSON parser, even though the parser for git-commit's inside-deflate format would also take a serious hit in simplicity.


Finally, thinking again about making date optional: it means that in order to allow cat_uniq_sort to still work, we'd need a sn (serial number) or similar key, so that the consuming programs could take into account only the latest review without relying on ordering. I'd be of the opinion of making sn mandatory, as it leaks no private information.

Ekleog avatar Oct 04 '18 13:10 Ekleog

$ xxd .git/objects/94/00242*

Yes, git stores everything compressed (but without the gzip header, so you can't just ungzip). cat-file only decompresses it in the use case above, so the data above is verbatim data git will parse when doing git show, git log and anything else that deals with commit objects.

In more detail: running git cat-file -p will call read_object_file on a given hash, which calls read_object_file_extended which calls oid_object_info_extended which calls sha1_loose_object_info which calls unpack_sha1_rest which does the unpacking after the header (all in sha1-file.c in git sources).

Sometimes I wonder if it is to much to ask for a "simple git" that is slow and maybe has no xdelta packfiles, but can still parse and manipulate simple git objects in couple thousand C LOC. Testing functional equality between git core builtins and simple-git core builtins would be useful, and explaining how git works to people that can read the source would be easier.

oxij avatar Oct 04 '18 14:10 oxij

GPL

I see. I'm fine with either GPL.

Alternative tool in another language

Hm, I see, I'm not against that. If we decide on the plumbing commands and write only the UI in bash we can then make the top-level git wot commands into wrappers that would call both tools and test for equality of results :)

message format

Yes, RFC822-alike makes much more sense for this. I once wrote RFC822 parser in bash, it was not hard.

Finally, thinking again about making date optional: it means that in order to allow cat_uniq_sort to still work, we'd need a sn (serial number) or similar key, so that the consuming programs could take into account only the latest review without relying on ordering. I'd be of the opinion of making sn mandatory, as it leaks no private information.

Yes, yes yes, my secret plan is to reuse the timestamp field in PGP packets for this (by setting it to sn via faketime :]).

oxij avatar Oct 04 '18 14:10 oxij

So I guess we now only have to agree on a data format, so that all our tools end up being interoperable :)

First, the non-controversial parts: what fields should we include? I can think of (also taking ideas from crev, which has the difference that it's attempting to review the whole state of the repo instead of split it by commit):

  • commit-id
  • tree-id (optional)
  • date (optional)
  • serial number (required if no date)
  • review comment (optional, multiline UTF-8)
  • review result (+, -, neutral -- equivalent to “trust” from crev)
  • review thoroughness (0-100% ? low-medium-high ? -- indicates understanding of the diff)
  • context understanding (0-100% ? low-medium-high ? -- indicates understanding of the context of the diff)

I wonder whether we should have some default values for review thoroughness and context understanding, or whether it's reasonable to force them being present in the signed blob.

And the second likely non-controversial choice: where should we put the signatures? I would say each signature on one line, sorted in a file named commit-id (potentially copied to tree-id if defined) in a refs/signatures branch (ie. like git-notes, as I understand it)

Once we get that agreed, we will be able to reasonably go to the question of rfc822 vs git-object-like vs JSON, which will be very happy I'm sure :D

Ekleog avatar Oct 04 '18 17:10 Ekleog

So I guess we now only have to agree on a data format, so that all our tools end up being interoperable :)

Yes.

So, I've read crev source and I agree with their idea that trust level in the key and trust level in the judgment are different things. So, I guess, we also need that. The simplest solution I see is just use two gpg trustdbs: one for "keys trust" and a second one for "judgment trust". I think there should be no arguments against using --export-ownertrust format for that, right? We would also need to agree on git config options for those --trustdb-name and related things.

So to verify a repo in nixpkgs case you would do something like

cd nixpkgs-keys
for key in keys/*.key; do
    gpg --import "$key"
done
# trust in keys belonging to whom they claim
cat bootstrap | sed 's/$/:6:/' | gpg --trustdb-name nixpkgs-keys --import-ownertrust
# trust in those people understanding what they are doing
cat bootstrap | sed 's/$/:6:/' | gpg --trustdb-name nixpkgs-judgments --import-ownertrust

cat trusted-contributors | sed 's/$/:6:/' | gpg --trustdb-name nixpkgs-keys --import-ownertrust
cat trusted-contributors | sed 's/$/:4:/' | gpg --trustdb-name nixpkgs-judgments --import-ownertrust
# and etc ...

cd nixpkgs
# wotr for "WoT reviews", subject to change, but I like it
git config wotr.keys-trustdb nixpkgs-keys
git config wotr.judgments-trustdb nixpkgs-judgments
git config wotr.trust-model pgp # the default, can also be tofu or whatever
git config wotr.completes-needed 2
git config wotr.marginals-needed 4

Hydra only does the above, users may add more keys and more/less trust as they wish.

First, the non-controversial parts: what fields should we include?

Btw, crev-data/src/proof/review/mod.rs has "thoroughness", "understanding", "trust", and "distrust". I'm like "WTF"? The "distrust" is clearly an artifact of their "low", "medium", "high" enum, but what does "trust" in my own review even mean? I disagree with that terminology.

I agree, however, that splitting my "level" into "thoroughness" and "understanding" might be useful, though.

So, here comes a draft of my design RFC:

---- BEGIN RFC ----

Definitions and data formats

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC2119 when, and only when, they appear in all capitals, as shown here.

The plain-text data thing in yet undecided format described below is a "review", review is signed by gpg and turned into a "packet" (PGP packet of the review and signature together), that is then base64 encoded and is appended to "packet-list", which is stored in a separate git branch called "review-branch" indexed by commit-id (before patch-id changes were merged, currently managed by git-notes, but that might change). All date values are UNIX timestamps.

Verification Process

Commits in review-branch MUST be signed, ALL commits MUST be validated before interpreting, invalid signature on review-branch commit MUST cause a critical error.

Commits in review-branch MUST monotonically increase their "author" and "commiter" dates by alternating the two types. I.e. the following MUST be observed:

0 <= commit_1_author_date <= commit_1_commiter_date <= commit2_author_date <= commit2_commiter_date <= ...

Any other sequence MUST cause a critical error.

Packets MUST always appended to the end of the packet-list, insertion anywhere except at the end of the packet-list MUST cause a critical error.

Packets with signatures by the same key in review-list MUST monotonically increase their packet's "date" field, failure to observe that MUST cause a critical error.

The "packet's-commit-date" is the earliest of "commiter" or "author" dates of the commit introducing the given packet into the review-branch.

Packet's "date" field MUST be <= than its "packet's-commit-date", failure to observe that MUST cause a critical error.

Note that the above still allows packet's "date" field to be interpreted as a "serial-number" as long as the packet itself was not signed with "expire date" field set, which is discussed below.

When interpreting packet-list only the packet with largest "date" field (serial-number) MUST be considered for each signing key, all other packets for the same key MUST be ignored.

When verifying a packet "GOODSIG" gpg status means the packet SHOULD be accepted as authoritative, "EXPKEYSIG" is explained below, "EXPSIG", "REVKEYSIG" and "ERRSIG with NO_PUBKEY" means the packet MUST be ignored, all other statuses, including "BADSIG" and "ERRSIG without NO_PUBKEY", MUST cause a critical error (can be made into a warning, in case gpg guys break something, I guess).

"REVKEYSIG" MAY cause a warning.

Note that by using "expire date" in the packet and "EXPSIG" gpg status code you can have reviews that automatically rescind themselves. I doubt this would be used much, but it can be useful in some use cases, no need to prohibit that, I think.

"EXPKEYSIG" gpg status code means the key's own signature has expired, it SHOULD be interpreted as "GOODSIG" if packet's-commit-date is <= than key's expiration date, MUST be interpreted as "ERRSIG with NO_PUBKEY" otherwise.

To reiterate, the following (among other things, i.e. not "iff") MUST be observed for packet to be accepted as authoritative:

packet's "date" field <= packet's-commit-date <= author's key own "expire date"

Failure to observe the first comparison causes a critical error, failure of the second only causes the packet to be ignored. (Because expired key can be extended after the fact by re-self-signing.)

Failure to interpret the contents of "GOODSIG" review MUST cause a warning and MUST cause such a review to be ignored.

Side note:

If we allow a review encoded in the packet to have its own "date" field (which I don't like for the following), it MUST be <= packet's-commit-date and >= packet's "date" field, failure to observe that MUST cause a critical error.

I.e. the following MUST be observed:

   packet's "date" field <= review's "date" field <= packet's-commit-date <= author's key own "expire date"

The problem with allowing that is you would have to interpret non-authoritative packets too to keep the above semantics of "everything that looks like MITM causes a critical error so that everyone would immediately notice". Hence, I'd rather not have these dates there.

Review format (given in git-object-like form)

Things after "#" and whitespace before are not in the grammar (comments)

<header>
<metadata fields>

<comment>

Should be read as "By signing this text I certify that I validated the following ...".

Header

Either

commit <commit-id> diff
[with <commit-id> diff
with <commit-id> diff
with <commit-id> diff]

meaning that the reviewer certifies the "diff" of this "commit" when applied to the parent of the first "" together "with" commit "diff"s of the later "commit-id"s

or

commit <commit-id> state
[of <filepath relative to repository root>
of <filepath>
...
of <filepath>] # empty list means "the whole thing"

meaning that the reviewer certifies the "state" "of" the following "filepaths" in "" tree.

Can be similarly extended for tree <tree-id> diff, tree <tree-id> state, patch <patch-id> diff. As you see, the difference between "diff" and "patch" is intentional here, "diff" is "change" in the abstract sense (I can agree to replace "diff" with "change" even), while "patch" is "exactly this patch file".

Metadata fields

# "while having archived the following" when reading "... I certify ..."
[context-understanding (medium|high|author)] # low by default, indicates understanding of the context of the diff
[diff-understanding (medium|high)]           # low by default, indicates understanding of the diff itself
[thoroughness (medium|high)]                 # low by default, indicates the effort spent checking it does what it claims to do
# "with the following" when reading "... I certify ..."
[result (!|-|+)]                             # 0 by default, this rough equivalent of "trust" of crev
[result-otherwise (!|-|+)]                   # "!" by default with header conditions, "0" without header conditions, this line MUST not be present without header conditions, this is the "result" when not all header conditions are met
# "and I think this should be assigned the following"
[priority (medium|high|panic)]               # low by default, doesn't influence anything, should be used for grabbing attention of other reviewers

"context-understanding":

  • "low" = "I might have seen this subsystem before.",
  • "medium" = "I looked into this subsystem intentionally and maybe made little changes there before.",
  • "high" = "I made big changes there before.",
  • "author" = "I wrote a good chunk of this subsystem myself, I know it in and out".

"diff-understanding": self-explanatory.

"thoroughness":

  • "low" = "glanced over",
  • "medium" = "read, evaluated/tested",
  • "high" = "spent a bunch of time playing with it".

"result":

  • "!" = "has a security issue!",
  • "-" = "I disapprove",
  • "0" = "FYI",
  • "+" = "I approve".

"priority": self-explanatory.

Examples

"Nothing looks bad, but don't trust me" would be

<header>
#context-understanding low
#diff-understanding low
#thoroughness low
result +

LGTM.

Drive-by LGTM in a system you know well would be

<header>
context-understanding high
diff-understanding high
#thoroughness low
result +

LGTM.

Thorough LGTM in a your own subsystem would be

<header>
context-understanding author
diff-understanding high
thoroughness high
result +

Running on top of this for a month.

"Nothing looks bad, but please can someone else review this" would be

<header>
#context-understanding low
#diff-understanding low
#thoroughness low
#result 0
priority high

Please, can somebody else review this?

"This looks very bad!" could be

<header>
#context-understanding low #or something else
#diff-understanding low #or something else
#thoroughness low
result !
priority panic

Backdoor?

"This looks a bit bad" could be

<header>
#context-understanding low #or something else
#diff-understanding low #or something else
#thoroughness low
result !
#priority low

Combined with X can lead to a low-impact security issue Y.

"WTF is this? This should be reconsidered!" could be

<header>
context-understanding high
diff-understanding medium
#thoroughness low
result -
priority panic

Please, revert! I think this will break X, which would be very high-impact!

CVE:

commit <commit-id> diff
with <commit-id with a fix>
context-understanding high
diff-understanding high
thoroughness high
result +    # if "with" commit is applied
#result-otherwise !
priority panic

This commit has a very high-impact open CVE! Fixed in <commit-id with a fix>, apply that immediately!

"I disapprove, here is a fix in my repo":

commit <commit-id> diff
with <commit-id with a fix>
context-understanding high
diff-understanding high
thoroughness high
result +
result-otherwise -
priority high

I think this should be reverted and done via another approach. See <commit-id with a fix> in my repo.

---- END RFC ----

I wonder whether we should have some default values for review thoroughness and context understanding, or whether it's reasonable to force them being present in the signed blob.

I opted in for "low". But this can be reconsidered, indeed.

oxij avatar Oct 05 '18 00:10 oxij

Whoops, looks like message went out too early… please disregard previous message if you received it by email. Comments about the RFC coming in later when I've finished writing them.

The simplest solution I see is just use two gpg trustdbs: one for "keys trust" and a second one for "judgment trust".

Hmm… Actually, I'm not really sure we need keys trust? If judgement trust is associated to a public key anyway, then there have been manual vetting of said public key (to attribute it a judgement trust), so we wouldn't need to also verify the key (actually, I may decide to trust your public key if I see you doing good reviews, without ever having checked your identity and thus not having signed your key, and even less given it ownertrust-in-the-WoT-sense)

I think there should be no arguments against using --export-ownertrust format for that, right?

Actually… GnuPG is not the only OpenPGP client, and sequoia-pgp is getting closer to being a usable OpenPGP client that may end up sucking less, so I'm not really sure about that.

Also, I wonder: anyway we won't be able to re-use GnuPG's WoT computation code for signatures… would we? If we won't, then why not just store them in a fingerprint:judgement trust: format? This way it's both trivial to parse and yet be trivial to import into an ownertrust db if need be (we could also decide to remove the trailing :, potentially)

wotr for "WoT reviews"

I first read w-OTR (off-the-record)… though I don't have any better idea, apart from just “wot” which would be potentially ambiguous.

Ekleog avatar Oct 05 '18 05:10 Ekleog

So while I don't intend on implementing this for a few weeks or want to further derail things, I want to point out that I do have one strong use case for hierarchical data: projects that yeild compiled artifact or artifacts.

I would want to support an optional flag to include a hash map of a directory full of output artifacts.

If multiple CI systems or reviewers agrees on the output binary hashes we now can attest 2 things strongly:

  • this code builds deterministically
  • these binaries came from this code

Those signatures could be detached with the repo and exported to allow the binaries to be verified independantly downstream and anchored back to the repo ref they were compiled from.

I experimented with the following format today as a single json line incorperating ideas from @oxij with mine:

{
  "body":{
    "commit":"d27e15a09812241167a61c3222cb6e97c5b0d6b0",
    "tree":"58215f173d6ef6892a07c9073d55145f387d1968",
    "patch":"068a37e54586de8339f13ec980d31f6c30b6f6e7",
    "date": "2018-10-05T03:01:46Z",
    "review": {
      "context-understanding": "author",
      "diff-understanding": "high",
      "thoroughness":"medium",
      "result":"+"
    },
    "artifacts": {
      "out.tar.gz":"2ad6f470b5398251018a4c25f5eb35686481681f"
    }
  },
"sig":"iQJFBAABCAAvFiEEZ1U/vaRrtxq9LgsLjkeh7DWhVR0FAlu2h3IRHGxhbmNlQGxydmljay5uZXQACgkQjkeh7DWhVR0K9Q//T+UleZFWlbiIFoUBp1hX0xzg/eEaTFnnlCdWWaa5f1E+TI80Pg/wXhpEUd98ghZThXwaWJwnPp34BpZ55GU5Qr1cXjY00Yt7I0p3a5TUsdk5FP8tiUWNQ9kA6npUOIVixa8HFkhz0SvHvXNAvWsasqw6nb+SNfA+aQYIP/wAY71ZpMLkJsMQssaFWsjL/hsSmOpi7VyLEFEuUmEM1CA6VPiMvp4rvP75Bt4DnEouyFPxk4WbVIqX4DIeG8+5W8jQCLlxV6HDH4g+POdZbD7/Yg6XjDKEfD2vM/OYrBczPZX0Tu1WhVyYOtQ/WsAr9wzZjgl50/9UkP6E4pt9ft42jdNhGcO1sEwzlC5W1efC2izjL4medEiLe00zIM0L42X10HebH22j4gKyb4EErSHo0H5eLisf1xAwAHiOE2wJou1SR5dVmKydXicVN3wuLqhzc52awmoOaEts1Q3zFyB8MfDIi39CACwAvOTrKw54raoGgzxDEFEP3sKWF1FuZQJreG1Ufg57Oxv5f1lOqvaoMN7ZNjcQkPUAqj7G7e/fwyXGCG2mvnz++D5/2JAMeeO4k8iwxGXEsF0qD3QveEZPFoIpPZX+uOGqM9iIPDJd9bGu0xI5yWeqLpbdXdm4ClsiRWOmavUvjm8RK99o7I9EvctCzrIrGnwlySZ0+Po856I="
}

I suggest of the above only commit and date be included by default via "git signatures add" with support for the rest with future flags and verification modes.

Simple and high security by default with flexibility and some degree of future-proofness for real world use cases.

Now for the most controversial part of the above, is my re-introduction of (optional) patch id support.

@tylerlevine made some strong points on the #! IRC today. It made me realize we actually have 3 verification levels possible which each have different use cases and tradeoffs. I have taken to calling this the "anchor" status. The signature will be valid, but the anchor status will tell you at what security level it is anchored to the current tree we are validating it in.

Stick with me as I try to lay down some context...

So there are three descending levels of verification possible and only the highest level needs to be reported.

Trust Anchor levels

commit

  • Signature valid against the commit hash of the ref we are verifying
  • Signature invalid after squashing
  • Signature invalid after changing commit messages
  • Signature invalid after rebasing
  • Signature invalid if changeset is moved to a new location in this repo or another repo

tree

  • Signature valid against tree hash of the ref we are verifying
  • Signature valid after squashing
  • Signature valid after changing commit messages
  • Signature invalid after rebasing
  • Signature invalid if changeset is moved to a new location in this repo or another repo

patch

  • Signature valid against patch id hash of the changeset we are verifying
  • Signature valid after squashing
  • Signature valid after changing commit messages
  • Signature valid after rebasing
  • Signature valid if changeset is moved to a new location in this repo or another repo

Use Cases

Small team willing to allow only one "ready to merge" changeset at a time

Use "commit" level anchoring and truck on. It is the default and you need special flags to verify any other way.

Above small team that also wants to be able to squash and amend comments

Consider "tree" level anchoring which offers greater flexibility.

Understand the tradeoff here is that a bad actor can squash all commits into one and change the commit message to offensive ascii art, but can't actually change the code.

An organization with many in-flight diffs and reviews merging out of order

Consider "patch" level anchoring for authors/reviewers.

Understand that this can only certify that a given changeset was authored and reviewed in isolation.

You -must- pair this with a CI system or maintainer(s) that will do one of:

  • Do only signed merge commits to master branch
  • Sign trusted HEADs/tag with "commit" or "tree" level anchoring post-merge

If you fail to do one of the latter steps to prevent history mutation you are knowingly creating a security hole and are a terrible person.

lrvick avatar Oct 05 '18 06:10 lrvick