cosign Sigstore bundle

Closes https://github.com/sigstore/cosign/issues/2131

Authored by @kommendorkapten and @patflynn

Summary

First iteration of the proposed new bundle format for cosign. See README.md for more details. The intent for this PR is not to be merged in this state, it is to start the discussion as decided on the design review meeting at 2022-08-22.

Before accepting, make sure these tasks are done:

[ ] https://github.com/sigstore/community/discussions/136 concluded
[ ] Updated the binary fields to have the correct datatype.
[ ] Move shared messages (Certificate, CertificateChain, Signarture, etc) into a shared place and reuse those definitions.

Release Note

N/A as of now.

Documentation

N/A as of now.

Aug 26 '22 13:08 kommendorkapten

Ping: @asraa @haydentherapper @dlorenc @bobcallaway @znewman01 @priyawadhwa

Aug 26 '22 13:08 kommendorkapten

Rendered README.md Rendered verification_flows.md

Aug 26 '22 13:08 kommendorkapten

Fredrik is out for a couple of weeks, one approach to iterating on this would be to merge as is, and then we can work through all the comments on this PR, generating new PRs as we go. It's going into a temporary spot anyway, WDYAT?

Aug 31 '22 13:08 patflynn

I think I'd prefer to work on a branch/fork, I don't think the diff between "original pass" and "latest pass" is meaningful in this case. Then we don't have to worry about tests etc.

Aug 31 '22 14:08 znewman01

I couldn't keep myself from going over the comments, thank you all so much for all the feedback. I'm away this week too, but I will dig right into this when I get back!

Sep 05 '22 11:09 kommendorkapten

@znewman01 @haydentherapper I think I have addressed all comments for bundle.proto (ignoring the encoding/string/bytes question). Can I please get a review of the updates, and possibly resolve any comments you think are solved, so we can have a better sense of the current status. I'll go over the verification flows early next week.

The question on encoding is something we should follow up later, or create an issue to track but don't let that distract us now 😄

Sep 16 '22 13:09 kommendorkapten

Yep! I'll do a more thorough review early next week.

Sep 16 '22 14:09 haydentherapper

I’m neutral on that question. Either way works for me!

On Sep 27, 2022, at 7:12 AM, Fredrik Skogman @.***> wrote:

@kommendorkapten commented on this pull request.

In specs/dotsigstore_bundle/pb/bundle.proto:

@@ -0,0 +1,187 @@ +syntax = "proto3"; +package sigstore;

+// https://raw.githubusercontent.com/secure-systems-lab/dsse/9c813476bd36de70a5738c72e784f123ecea16af/envelope.proto +import "envelope.proto"; + +option go_package = "github.com/sigstore/xxx/bundle"; +option java_package = "com.github.sigstore.xxx.bundle"; + +// Notes on versioning. +// The primary message ('Bundle') MUST be versioned, by populating the +// 'media_type' field. Semver-ish (only major/minor versions) scheme MUST +// be used. The current version as specified by this file is: +// application/vnd.dev.sigstore.bundle.v0.1+json +// The semantic version is thus 'v0.1'. How about moving the version from the subtype into the parameters field? I.e like this: application/vnd.dev.sigstore.bundle+json; version=0.1. IMHO this is a cleaner approach, and easier to work with. Note that the Sigstore signature spec relies on OCI formats, and so uses a format where the version is part of the subtype: https://github.com/opencontainers/image-spec/blob/main/media-types.md#compatibility-matrix

WDYT @haydentherapper @znewman01 ?

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.

Sep 27 '22 11:09 znewman01

I haven't been reviewing generated code etc

I have started to rework the examples and the verification flows etc. But as you said, waiting for consensus too.

Alternatively, we could make a new home just for the protos, and publish releases for various language ecosystems from there.

This is my personal preference too, to have a single place where all proto definitions are, easy to work with/navigate and as you said, a common place where all language clients can reside.

Sep 29 '22 13:09 kommendorkapten

great discussion here - any chance we can ensure the namespace matches what we already have in fulcio dev.sigstore.*?

Sep 30 '22 21:09 bobcallaway

any chance we can ensure the namespace matches what we already have in fulcio dev.sigstore.*?

Of course, my bad.

Oct 03 '22 06:10 kommendorkapten

When reviewing, my preference would be to focus on the protobuf files. The verification algorithm is secondary. I rather first see that we get some movement on the bundle format then we can discuss the verification algorithm separately, in a different PR. Thoughts on that flow?

Oct 03 '22 06:10 kommendorkapten

There hasn't been any recent comments on the bundle format in a long time now. Is there a chance we can take this forward @bobcallaway @haydentherapper @znewman01 ? Proposed has been to have a new repository, like sigstore/protobuf where the protobuf files are located. This would also be a great place where all language clients could be built (and for the go client, the generated code).

Oct 13 '22 13:10 kommendorkapten

I think we're in agreement that the format is in a good state, sans whatever comments that arise in a final review. What's the next step @kommendorkapten? Should we begin more implementation in this PR, or remove all of the example code and just check the bundle format in first?

Oct 13 '22 15:10 haydentherapper

@haydentherapper I think getting the bundle format itself approved and merged first is the most important step, as there are a few folks waiting for it to be "finalized". Once the bundle format is approved, we can review the verification flows etc.

I'll go ahead and clean up this PR, and remove everything but the protobuf files.

Oct 14 '22 06:10 kommendorkapten

The Alpine Rekor entries include metadata from the PKGINFO. AFAICT, these are just unvalidated metadata and duplicate information from the artifact.

The PKGINFO key value pairs are what are covered by the signature embedded within the APK. One of those KVPs is datahash which is the sha256 digest over the data.tar.gz (containing the actual package content). So the integrity of all KVPs are covered by the signature, but the only KVP that (can be &) is explicitly validated by rekor is datahash. The rest of the KVPs are included in the Rekor entry for search convenience, but could be removed in a future version of the alpine type if desirable.

To verify the Rekor signature, you'd need to parse the .apk file and format the entry yourself, which feels like a lot of work. Then, the signature lives inside the artifact.

What do you mean by "the Rekor signature"? While I agree that recreating the leaf hash for the canonicalized entry would be awkward, you could either use the golang package to do this OR if you had the public key and APK you could simply compare their SHA256 values to those provided in the entry which would be sufficient.

I feel like I'm missing some context on the design philosophy of Rekor entries. Here's how I think about it:

A Rekor entry contains:

some signed data (or a reference to such data)

the signature over that data (or the signature could be in theh artifact itself)

optionally with unverified metadata that can be used for searching.

The entry can contain the data itself or just a hash; it can contain the signature itself, or the signature can live inside the data.

Users can use the unverified metadata for searching, but it should be considered untrusted.

After verification, users can trust the artifact along with some of the other metadata (exactly which metadata depends on the verification procedure used from the type).

They still cannot trust the unsigned metadata, unless they reconstruct it from the artifact (at which point, they should just use the metadata from the artifact).

This is accurate with one exception: What unsigned metadata are we persisting in the log? I'd argue that is a correctness bug we should fix immediately.

Rekor entries should contain sufficient information for users to independently verify that the signature over a given artifact was correctly verified using the verification material included within the entry. We'd rather not persist the signed data (with a few exceptions). Dealing with types for embedded signatures VS detached signatures represent the vast majority of this complexity at the moment.

Are we better off fixing the Rekor entry types themselves to remove the VerificationMaterial? Rekor wouldn't be able to validate the entries, but it already can't, in a lot of cases.

Rekor must validate the tuple of (verification material, signature, artifact) before inserting it into the log. I agree that it can not validate it again solely based on what is included in the log entry (in some cases), but unless we start to act as a content store I don't see a way around that.

Oct 14 '22 22:10 bobcallaway

Appreciate that, @bobcallaway!

I think part of the issue is that there are several signatures in play to confuse. I'll try to be more specific.

My earlier comments were largely from the perspective of a client doing verification. They get handed a bundle and an artifact, and know a verification policy a priori. How can they verify an Alpine package? I don't think they can!

What do you mean by "the Rekor signature"?

I meant that it's hard to verify the signature from Rekor itself over the complete "alpine" entry given the information in the bundle as-is. The problem is that we don't include the full entry in the bundle. I guess you could (1) put it alongside the bundle or (2) look it up online. But that was the point of the bundle—to prevent you from needing to do that. And if you don't have the full entry, and can't compute the full entry, how can you check that the signature on the TransparencyLogEntry in the bundle corresponds to the right signature on the right artifact?

While I agree that recreating the leaf hash for the canonicalized entry would be awkward, you could either use the golang package to do this

Yeah, that's true but pretty annoying (especially for a client written in some other language—we now have an M x N set of language/type pairs to maintain). Further, what if some fields were optional to include? With alpine, it seems legit to only include the datahash key-value pair in your public Rekor entry. How would a verifier trying to reconstruct it know that?

OR if you had the public key and APK you could simply compare their SHA256 values to those provided in the entry which would be sufficient.

My point is that you don't have the entry available with just the artifact and the bundle to do such a comparison.

They still cannot trust the ~unsigned~ unverified metadata, unless they reconstruct it from the artifact (at which point, they should just use the metadata from the artifact).

This is accurate with one exception: What unsigned metadata are we persisting in the log? I'd argue that is a correctness bug we should fix immediately.

That was a typo, oops. I meant "unverified," in the sense that Rekor just takes the client's word for it that this metadata is correct (as with the PKGINFO data). Edited above.

Rekor must validate the tuple of (verification material, signature, artifact) before inserting it into the log.

Devil's advocate: what happens if it doesn't do that validation and instead just checks the format? Anything catastrophic?

Are we better off fixing the Rekor entry types themselves to remove the VerificationMaterial? Rekor wouldn't be able to validate the entries, but it already can't, in a lot of cases.

I agree that it can not validate it again solely based on what is included in the log entry (in some cases), but unless we start to act as a content store I don't see a way around that.

Agreed that we probably shouldn't start storing all the content, and I think that's okay.

My proposal was pretty narrow (but ill-articulated, sorry). Basically, one stated reason for not including the entries verbatim was that we don't want to put naked public keys inside of the bundle, for fear that verifiers will use them without checking that they correspond to meaningful identities. If we break down today's Rekor entry into (a=public key, b=everything else), we could stick only (b) in the bundle. This gets a little diecy—still probably want to stick the verification material in the log, which means that clients need it. You could try to get fancy by using H(a) || b for a Merkle leaf value and distributing H(a), but that's pretty gross.

I think I'm pretty skeptical that it's a big problem to include the public key—it seems to be causing more trouble than it's worth to omit it. I think good client specifications/libraries can prevent library writers from holding it wrong (if we were talking about end-users, making it bulletproof would be much more important).

Oct 14 '22 23:10 znewman01

Re @znewman01

I'm confused about how verification will work. Verifiers need to reconstruct the entry. Do they have enough information to do so, and can we do so unambiguously?

Specifically:

Every timestamp_verification_data in a given bundle should contain signatures for the same Rekor entry, right?

If so, why is there a KindData in each timestamp_verification_data?

There are cases where the artifact + bundle doesn't have the information we need to reconstruct the Rekor entry (e.g., if we only have RFC 3161 timestamps, there's no KindVersion), right?

Verifiers should only reconstruct the Rekor entry when there is at least one Rekor entry in the bundle, as there is no requirement that the entry was ever put on any Rekor instance. Currently there are sufficient information in the bundle to reconstruct the entry iff the Rekor kind is either intoto or hashedrecord, as those two was the only one being discussed initially. The conent in the bundle is a oneof, this can be extended if we need to capture more types. This will of course make the M x N verification flow grow in each client library. But I don't see this as a huge issue, it's only a mapping from the data in the bundle to the correct entry in Rekor that can be properly canonicalized so the verification can happen. It's still code, but not that complex, it must only be written once, and is easy to test for correctness.

The reason each Rekor entry contains a KindVersion is to make the structure align better with the response from Rekor, and as I would guess the most common scenario (at least initially) is to only have a single entry, I would argue this is a bit more developer friendly, and would prefer a bit of duplication to simplify the data structure.

Yes, you are correct that if no Rekor entry exist in the bundle, there is no KindVersion, so this mean that the verifier can not verify it agains any Rekor instance, and have to resort to the signed RFC3161 timestamp during verification, which would be primarily to verify that it was signed during the time the certificate was valid.

Oct 17 '22 12:10 kommendorkapten

Verifiers should only reconstruct the Rekor entry when there is at least one Rekor entry in the bundle, as there is no requirement that the entry was ever put on any Rekor instance.

I think this is main point of disagreement. I was imagining TSAs signing off on Rekor entries, not just the signature bytes.

This just feels more consistent to me. If there are multiple TransparencyLogEntrys from multiple Rekors, I'd expect that they signed the exact same Rekor entry. Why would the thing that the TSA signs be any different?

Maybe relevant, maybe not: there are [subtle crypto attacks](https://www.bolet.org/~pornin/2005-acns-pornin+stern.pdf](can work backwards) to create a new public key that a fixed signature validates against. I don't think this is actually exploitable in our setting, but IMO the more context in the data that gets countersigned the better.

Currently there are sufficient information in the bundle to reconstruct the entry iff the Rekor kind is either intoto or hashedrecord, as those two was the only one being discussed initially. The conent in the bundle is a oneof, this can be extended if we need to capture more types.

So, to clarify the plan is "each Rekor kind/version has a corresponding content type"? For instance, for Alpine we would add a new one that included the PKGINFO data.

If that's the case, I think I'd want to be more explicit about the mapping. So the doc for content would say: "hashedrekord 0.0.1 => MessageSignature, intoto 0.0.1 => ...".

But then why not just use the Rekor types directly?

Also, what about downstream users who might want to verify other Rekor types? They're supposed to be pluggable, the bundle format should be too IMO.

This will of course make the M x N verification flow grow in each client library. But I don't see this as a huge issue, it's only a mapping from the data in the bundle to the correct entry in Rekor that can be properly canonicalized so the verification can happen. It's still code, but not that complex, it must only be written once, and is easy to test for correctness.

I would be very worried if client libraries needed to reconstruct the entry by e.g. parsing an Alpine ".apk" file (which is what we'd have to do if we weren't adding another content value). If it's just canonicalizing data, I can live with that (though if we can avoid it, we should IMO).

The reason each Rekor entry contains a KindVersion is to make the structure align better with the response from Rekor,

IMO alignment with Rekor's response as it exists today should be a non-goal; I think there are several proposals floating around right now to improve it.

Alignment with Rekor's eventual response is a good idea. However, I think the bundle format should inform the Rekor API, not the other way around.

and as I would guess the most common scenario (at least initially) is to only have a single entry, I would argue this is a bit more developer friendly, and would prefer a bit of duplication to simplify the data structure.

I don't actually think this is more dev friendly. This means that my function to reconstruct the entry for verification needs to access both the TransparencyLogEntry which may or may not exist and the content. If you stuck all the information in content, that function is much simpler.

Yes, you are correct that if no Rekor entry exist in the bundle, there is no KindVersion, so this mean that the verifier can not verify it agains any Rekor instance, and have to resort to the signed RFC3161 timestamp during verification, which would be primarily to verify that it was signed during the time the certificate was valid.

To me, this complicates the verification flow unnecessarily. "If timestamp, check that the TSA signed X, otherwise, check that Rekor signed Y" seems more complicated than "filter the entries in TimestampVerificationData to find which ones signed X."

Oct 17 '22 13:10 znewman01

could we lean into this comment: "Could we push something marked experimental pretty soon (with a self-destruct clock, to prevent it from being entrenched), then try to build client libraries with it? I think that will help us shake out the rought edges."? Feels like we're never going to merge :). We can get this in, start implementation in the various clients, and iterate on the format.

Oct 17 '22 14:10 patflynn

could we lean into this comment: "Could we push something marked experimental pretty soon (with a self-destruct clock, to prevent it from being entrenched), then try to build client libraries with it? I think that will help us shake out the rought edges."? Feels like we're never going to merge :). We can get this in, start implementation in the various clients, and iterate on the format.

FWIW I'm pretty happy with this as-is. I still have one high-level concern (articulated in my last comment) but doesn't need to block the PR.

That said—what's the benefit of merging the experimental version to the Cosign repo? I think we decided it makes more sense in a new repo long-term anyway. So what if we just made that (even if it's just on a personal GH account for now), then we could merge whatever we want and do some more formal process to pull it back into the sigstore/ universe.

Oct 17 '22 14:10 znewman01

That said—what's the benefit of merging the experimental version to the Cosign repo? I think we decided it makes more sense in a new repo long-term anyway.

This is my preference to, like a repo namedsigstore/protobuf, or sigstore/protobuf-experimental.

Oct 17 '22 14:10 kommendorkapten

@znewman01

I think this is main point of disagreement. ...

There are a few important topics in that comment, and this PR is already getting quite large. When we get a new repo where the bundle goes, how about I open up some issues with your different questions as I think they can be more efficiently covered separately?

edit: s/you/your

Oct 17 '22 14:10 kommendorkapten

There are a few important topics in that comment, and this PR is already getting quite large. When we get a new repo where the bundle goes, how about I open up some issues with your different questions as I think they can be more efficiently covered separately?

Works for me!

Oct 17 '22 15:10 znewman01

Meta comment - It seems like a lot has changed since the last time I looked at it, so I'll need to do another pass. Can we avoid any more significant changes before it's checked in, as it's hard to know what's changing between each pass?

Oct 17 '22 16:10 haydentherapper

A use case for key IDs: https://github.com/sigstore/cosign/issues/1351

Some KMSes support key rotation, and the same key has different "versions." When we use KMS, we should stick the key version as a hint in the bundle.

Oct 20 '22 12:10 znewman01

https://github.com/sigstore/protobuf-specs is ready whenever this can be moved over.

Oct 21 '22 11:10 bobcallaway

Thanks @bobcallaway, I'll move and close this PR today.

Oct 21 '22 11:10 kommendorkapten

Closing this PR in favour of https://github.com/sigstore/protobuf-specs/pull/1

Oct 21 '22 14:10 kommendorkapten

cosign cosign copied to clipboard

Sigstore bundle

Summary

Release Note

Documentation

cosign
cosign copied to clipboard