image-spec icon indicating copy to clipboard operation
image-spec copied to clipboard

Proposal: New Mediatype - Container Image Encryption

Open lumjjb opened this issue 7 years ago • 63 comments
trafficstars

Overview

We would like to propose a new media type for encrypted layers of a container image. This addition would facilitate the ecosystem for encrypted container images. This allows users with stricter trust requirements to be able ensure end-to-end encryption from build to runtime. In addition, it allows users to use a centralized managed repository (i.e. Docker Hub) without any risk of their images being compromised.

Outdated Design Doc (Look at PR for updated): https://docs.google.com/document/d/146Eaj7_r1B0Q_2KylVHbXhxcuogsnlSbqjwGTORB8iw/edit?usp=sharing

Link to presentations: Dockercon US 2019: https://www.youtube.com/watch?v=9LyPUy4XYbs&list=PLkA60AVN3hh-XtoZ8zoZir6wnpaVXGUgk&index=28 Kubecon CN 2019: https://www.youtube.com/watch?v=bzHPnlSfM_8

Tracking implementations:

containerd https://github.com/containerd/cri/blob/master/docs/decryption.md https://github.com/containerd/imgcrypt

crio https://github.com/cri-o/cri-o/blob/master/tutorials/decryption.md

buildah https://github.com/containers/buildah/pull/2271

skopeo https://github.com/containers/skopeo/pull/732

Call for Contribution:

  • Quay
  • podman
  • Kaniko
  • Docker CLI

Links:

Details of updated changes can be viewed in the PR. Implementation of those changes can be viewed in this PR.

Proposal written and discussed by: Brandon Lum (@lumjjb), Dimitrios Pendarakis, Hani Jamjoom (@jamjoom), James Bottomley (@jejb), Phil Estes (@estesp), Stefan Berger (@stefanberger), Alaa Youssef within IBM.

This is a follow-up from a discussion with @stevvooe , and @dmcgowan at DockerCon.

Goals

In coming up with a proposal, we considered the following:

  • Application creators should not need to worry about details of encryption. Flows in the DevOps process and the runtime orchestration would be responsible for facilitating encryption.
  • Re-usability of layer de-duplication among container images
  • Encryption should be end-to-end, from the build step to when the container needs to be run on the worker machine. (i.e. registry owner should not be able to view the encrypted contents).
  • Encrypted images created can be for multiple trusted entities, and key management is responsible for managing that trust.
  • Specification should be easily integrated to generic with and across key management systems.
  • Secondary experimental goal: Allow more granular control of security, to provide possibility of desegregation and more fine-grained security controls. (i.e. base OS, middleware, application can be encrypted separately).

Proposed Changes

Details of updated changes can be viewed in the PR. Implementation of those changes can be viewed in this PR.

image

In the creation of a container runtime bundle from the encrypted images, the runtime would perform an additional step of performing the decryption based on the information given in the annotations.

Additional Details

We include some of the interesting discussions we have about the proposal.

Encryption Standard

The encryption/decryption will be done will be according to OpenPGP standard as according to RFC4800.

  • The layer data will consists of Symmetrically Encrypted Data Packet as in RFC4880 Section 5.7
  • The wrapped keys (org.opencontainers.image.pgp.keys) will be an array of Public-Key Encrypted Session Key Packets as in RFC4880 Section 5.1
  • Decryption will be done by processing the wrapped key packets followed by the encrypted data packets.

Key Management

The purpose of Key Management is to assist in performing distribution, storage and use of keys. We note that this is important to be able to ensure that container runtimes are able to obtain the keys for decryption, and for encrypted container image creators to pass keys to their kubernetes/docker runtimes. However, we note that Key Management in itself can be seen as a separate component that assists the use of Encrypted Container Images. Therefore, we treat the process of Key Management as separate from the design of the Encrypted Container Images itself.

We note however, that Key Management is important in ensuring technology adoption. We discuss Key Management briefly. We have two models of key management that we consider, they are not exclusive and can probably be used in tandem.

Own Key Management

Key management is handled by the operator of the cloud, and plugins are provided to allow interfacing with existing Key Management Solutions. In this case, the Key Management Solution (i.e. Vault, Azure Vault, IBM KeyProtect, etc.) need to be trusted by the company (run by themselves or by a trusted service).

An example:

Company A has an existing internal Vault service. To build an image, the build machine or developer generates a symmetric key through Vault to perform encryption of the image. This symmetric key is then stored in the Vault service via an Encrypted Container Image Vault plugin. To run the image, the administrator configures it's kubernetes cluster with a Vault token to use the internal vault service via a Encrypted Container Image Vault plugin.

Fully Centralized Untrusted Key Distribution

Key management is handled by users and container runtimes interacting with an untrusted Key Distribution server (alongside container image registry). Private keys are still managed individually by users but no additional external party's trust is required (if server is compromised, no keys are lost). In addition, it provides a central location for users to manage. The trust model is similar to that of Docker Notary server.

An example:

To build an image, the build machine or developer generates a symmetric key and performs the encryption of the image. The symmetric key is then wrapped with the public key of the entities/receipients that it allows access to and registered with the server.

We spin up a new cluster (that operates uses a docker ID of orgcluster). The kubernetes cluster runtime downloads an encrypted image and identifies it does not have the necessary keys to decrypt the image. The runtime sends a request for the key for the layer to the FCUKD server. The approver (image owner or delegate) gets a notification of the request, and verifies, approves and submits a wrapped key to the system. Thereafter, the requestor (cluster) can download unwrap and use the symmetric key to decrypt the container image. The approval process may be automated through an Access Control Lists set by the key owner.

We note that it is possible to provide the symmetric key to the Fully Centralized Untrusted Key Distribution system and perform auto-approvals, but it is highly discouraged since it weakens the security of the system.

lumjjb avatar Jul 09 '18 17:07 lumjjb

In general, I think this proposal looks good, although, I would recommend namespacing the annotation names, per https://github.com/opencontainers/image-spec/blob/master/annotations.md.

This might need more review from security experts. I'd be worried about defining an algo/keyid schema and would prefer to use something predefined, if possible.

Is there an reference implementation we can use to test this approach with?

stevvooe avatar Jul 10 '18 00:07 stevvooe

In general, the OCI deals with already-tested-in-the-field technologies (at least that's what recent discussions about distribution have led me to believe). While having a discussion with us is definitely encouraged (since we probably will have opinions on the design), OCI is extensible specifically to allow vendors to try out their ideas before standardisation.

Re-usability of layer de-duplication among container images.

I'm a little bit worried about this goal for multiple reasons.

  • Deduplication tables act like a form of compression and we know from BEAST that this general construction can result in attacks on most crypto primitives (I'm not sure if this is actually true for dedup tables but it is something to keep in mind). In addition, this also allows for fingerprinting of encrypted data (something that would not normally be possible -- because ideally you want the same cleartext to have different ciphertext).

  • This might make it difficult for us to implement content-defined-chunking for files, as well as removing the sequential archive concept from our format (though of course we would probably just re-design it in that case, but it is something to keep in mind).

Encrypted backup tools like restic do have very interesting designs for chunk-deduplicated snapshot-based filesystem images -- have you taken a look at how they work (because that design should influence image-spec if we want to purge tar archives from the format)?

I'm currently writing a blog post that details how we can improve image-spec to not use tar archives anymore, and to add chunk-deduplicated snapshot-based filesystem images. But of course that will take quite a while to make into a proposal (since it also requires having a realworld implementation).

cyphar avatar Jul 10 '18 00:07 cyphar

Re-usability of layer de-duplication among container images.

I'm a little bit worried about this goal for multiple reasons.

Deduplication tables act like a form of compression and we know from BEAST that this general construction can result in attacks on most crypto primitives (I'm not sure if this is actually true for dedup tables but it is something to keep in mind).

I don't think this applies to this proposal. BEAST is based on attacker provided (or attacker-modified) plaintext. For example, a HTTPS response which contains data from a user-supplied form would be one way for the attacker to influence the plaintext. But in this case, the encrypter is only signing layers that they've decided to build themselves. If you're publishing builds of something you develop yourself, there would be no attacker-influenced plaintext. You'd only run into trouble with BEAST if you were encrypting attacker-influenced layers. And if you have attacker-influenced layers, you have bigger problems than BEAST ;).

This might make it difficult for us to implement content-defined-chunking for files, as well as removing the sequential archive concept from our format (though of course we would probably just re-design it in that case, but it is something to keep in mind).

This approach (append +enc to the media type and stuff in some annotations) seems pretty generic. You could even have a Merkle tree where each node was encrypted, although only folks with access to the key would be able to walk that tree for garbage collection and such. Where the registry-walkable trees are required, you'd have to shuffle things around a bit to encrypt the payloads but not the Merkle links. So, as you say, potentially some impacts, but nothing that seems difficult to work around.

enc.keyid - The reference to the key to use to perform the decryption of the layer.

Another approach to key distribution would be to encrypt to multiple public keys. For example, OpenPGP encrypts the payload with a random symmetric key, and then encrypts that symmetric key to one or more public keys. I don't know how that approach would fit into your enc.algo property, but you could always use additional enc.* properties to support it. If we want to support that use case, we may want to make enc.keyid an array of strings, instead of making it a single string.

wking avatar Jul 10 '18 01:07 wking

@wking

I don't think this applies to this proposal.

I think you missed the second part of that point, which is that (unrelated to BEAST), taking advantage of deduplication of layers opens you up to fingerprinting attacks (unless the plan is to use a hash of the ciphertext -- in which case you should get zero deduplication because ciphertext should appear random and is not consistent when regenerated).

cyphar avatar Jul 10 '18 09:07 cyphar

I think you missed the second part of that point, which is that (unrelated to BEAST), taking advantage of deduplication of layers opens you up to fingerprinting attacks...

Can you link docs for "fingerprinting attacks"? Searching is turning up things like TOR traffic analysis, which doesn't sound like what you mean.

Going back to the original:

Re-usability of layer de-duplication among container images

For example, if Alice distributes her sensitive application with:

  • Image 1
    • Unencrypted base layer(s) (e.g. from an Alpine 3.7 image)
    • Encrypted app v1.0
  • Image 2
    • Different unencrypted base layer(s) (e.g. from an Alpine 3.8 image)
    • Encrypted app v1.0 (same blob as for image 1)

That's de-duping, because there's only one blob for the encrypted app. I don't see how it would increase your exposure to attacks, although perhaps the fact that the encrypted app presumably works with both Alpine 3.7 and 3.8 would give you some knowledge of the encrypted content which could be used for a known-plaintext approach. If you were concerned about that sort of thing, you could always encrypt all of your layers.

And obviously folks will be exposed to ciphertext-only attacks, so you shouldn't be publishing anything that you don't want your grandkids to read ;).

But as long as you're using per-blob session keys (as in the OpenPGP approach I linked above) and periodically rotate your main key (which is only used for signing the short, random session keys), I think this approach seems pretty solid.

wking avatar Jul 10 '18 11:07 wking

@wking

Another approach to key distribution would be to encrypt to multiple public keys. For example, OpenPGP encrypts the payload with a random symmetric key, and then encrypts that symmetric key to one or more public keys. I don't know how that approach would fit into your enc.algo property, but you could always use additional enc.* properties to support it. If we want to support that use case, we may want to make enc.keyid an array of strings, instead of making it a single string.

Do the 'more public keys' in OpenPGP belong to all the people you want to communicate with? I would say our current thinking is that one would place a encrypted symmetric layer encryption key on some server and decrypt it once an entitled user places a request for the layer decryption key. The user would have to pass along his public key (certificate) and we would wrap the key with that public key and pass it back. This allows adding users to an access control list for a particular key in the future and pass an individually wrapped key back for each one of them.

stefanberger avatar Jul 10 '18 14:07 stefanberger

Do the 'more public keys' in OpenPGP belong to all the people you want to communicate with?

They could. You could also encrypt to a key shared by the QA team, and a key shared by the production team, etc.

I would say our current thinking is that one would place a encrypted symmetric layer encryption key on some server and decrypt it once an entitled user places a request for the layer decryption key.

That works too, it's just about whether the encrypted session key is stored on the blob or independently. If it's on the blob, it's easier to mirror, because it's transmitted through whatever channel you already use for blob mirroring. If it's in a separate system, you can add/remove recipients without adjusting the blob and rootwards Merkle tree. I expect that the optimal solution will depend on the individual users and use cases. And either way we go with this, the other approach is only a new media-type extension away, so recovery is possible if our initial best-guess is wrong.

wking avatar Jul 10 '18 17:07 wking

@stevvooe

I would recommend namespacing the annotation names, per https://github.com/opencontainers/image-spec/blob/master/annotations.md.

Modified the proposal to reflect org.opencontainers.image namespace.

This might need more review from security experts. I'd be worried about defining an algo/keyid schema and would prefer to use something predefined, if possible.

Just to clarify, are you referring to just using a specific algorithm? Or are you looking more for being able to point to an RFC reference of some sort such as https://tools.ietf.org/html/rfc5116 ?

Is there an reference implementation we can use to test this approach with?

Not at the moment. This is our next step. Our current thoughts are to prototype something in containerd for the runtime. We are most definitely open to suggestions on this.

@wking

I think you articulated very clearly what we wanted out of the "de-duplication". Thanks :).

Another approach to key distribution would be to encrypt to multiple public keys. For example, OpenPGP encrypts the payload with a random symmetric key, and then encrypts that symmetric key to one or more public keys. I don't know how that approach would fit into your enc.algo property, but you could always use additional enc.* properties to support it. If we want to support that use case, we may want to make enc.keyid an array of strings, instead of making it a single string.

I really like the idea of wrapping the keys! This would simplify some of the key management problems and infrastructure a lot.

However, I am a little unsure about the scenario where we would want to dynamically allow new users/parties to use our image. A specific case in mind I had was bootstrapping a cluster i.e. when we set up new kubernetes cluster, the cluster would have a CA. And if we wanted to provide access to an encrypted image to a new cluster, we would wrap the symmetric key with the public key of the cluster CA.

I am in favor of using the wrapped keys along with the image as you proposed, and also provide the option to interface with a key management system in the event that the wrapped keys in the registry is not for the user. I am on board for making a wrapped key array.

lumjjb avatar Jul 10 '18 20:07 lumjjb

I suppose the next question is how to implement this and what command line parameters to pass. I suppose docker commit should be instrumented to support this first. docker build may be another later candidate.

Would we want to manage symmetric keys internally somehow with a new set of commands>? We could pass it directly to docker commit --encryption-key file:<path> or reference the key with by name or id if internally managed docker commit --encryption-key name:<keyname>?

Also, how do we pass our one or multiple friends' public key via command line? docker commit --wrapping-pubkey file:<pubkey> and allow multiple of those be passed? Or pass in a config file that references those public keys/certs?

Do we want to give control over the individual layers or just the last one? docker commit --encrypt-layers=<last|all>. And we would refuse to re-encrypt already encrypted layers (in the base image)?

stefanberger avatar Jul 12 '18 14:07 stefanberger

Also, how do we pass our one or multiple friends' public key via command line? docker commit --wrapping-pubkey file:<pubkey> and allow multiple of those be passed?

gpg uses --symmetric <key> for symmetric keys and --recipient <name|email|key-id?> (which can be given multiple times) for wrapped session keys.

wking avatar Jul 12 '18 14:07 wking

Do we want to tie this in with gpg in some way or manage recipients in some way ourselves?

stefanberger avatar Jul 12 '18 15:07 stefanberger

The pgp public key server may come in handy...

stefanberger avatar Jul 12 '18 15:07 stefanberger

On Thu, 2018-07-12 at 08:25 -0700, Stefan Berger wrote:

Do we want to tie this in with gpg in some way or manage recipients in some way ourselves?

Pretty much everything is following the spirit (if not the letter) of the s/mime spec PKCS#7:

https://tools.ietf.org/html/rfc2315

for Enveloped-data, so this proposal should follow it as closely as is practicable

James

jejb avatar Jul 12 '18 15:07 jejb

Pretty much everything is following the spirit (if not the letter) of the s/mime spec PKCS#7: https://tools.ietf.org/html/rfc2315 for Enveloped-data...

That section talks about the same random-session-key-encrypted-to-each-recipient approach. Do you see a difference between PKCS#7 and OpenPGP on that score?

wking avatar Jul 13 '18 02:07 wking

PGP seems to have its own message format. Section 5.1 (https://tools.ietf.org/html/rfc4880#section-5) describes the support for multiple recpients:

5.1.  Public-Key Encrypted Session Key Packets (Tag 1)

   A Public-Key Encrypted Session Key packet holds the session key used
   to encrypt a message.  Zero or more Public-Key Encrypted Session Key
   packets and/or Symmetric-Key Encrypted Session Key packets may
   precede a Symmetrically Encrypted Data Packet, which holds an
   encrypted message.  The message is encrypted with the session key,
   and the session key is itself encrypted and stored in the Encrypted
   Session Key packet(s).  The Symmetrically Encrypted Data Packet is
   preceded by one Public-Key Encrypted Session Key packet for each
   OpenPGP key to which the message is encrypted.  The recipient of the
   message finds a session key that is encrypted to their public key,
   decrypts the session key, and then uses the session key to decrypt
   the message.

So this sounds good for supporting multiple recipients if we were to just us OpenPGP tools for creating the encrypted layers. We would follow a standard ... Though, it also ties us into the PGP message format. If we wanted to extend the system with a more dynamic handling of users that can decrypt the layers of an image we'd have to later on be able to parse the OpenPGP message to find the encrypted message and also find an id of the symmetric key.

Ideally we should be able to find the following information somehow either in the OpenPGP data stream or separate the encrypted message and the metadata to decrypt the message in a form like this one here extending the above proposed image annotations:

annotations: {
    enc.keyid: "0x12345678",
    enc.keyid_owner_account: "image-author",
    enc.wrapped : [{
        key_owner: "[email protected]",
        key_id: "0x11223344",
        wrapped_key: "0x76923749238749286565...",
    }, {
        key_owner: "[email protected]",
        key_id: "0x44332211",
        wrapped_key: "0x983r093275765",
    }]
}

If I am [email protected], I will use my key 0x11223344 to decrypt the wrapped_key part to get to the symmetric key. If I don't find myself in the enc.wrapped key list I can go go the server and ask for enc.key 0x12345678 under account image-author for the symmetric key and will get it back encrypted with my public key assuming I am on the ACL for this key. Would we want to try an OpenPGP type of decryption of the layer first (assuming the layer is in OpenPGP format) and if this fails fall back to asking the server for the key? I am just wondering whether OpenPGP is suitable do to this or whether there's some tool implementing PKCS-7 type of messages that seem to be more suitable?

stefanberger avatar Jul 13 '18 11:07 stefanberger

enc.keyid: "0x12345678",
enc.keyid_owner_account: "image-author",

I don't think we need an enc.keyid_owner_account. Will the symmetric key-store really need to shard these by author? If you're concerned about garbage collection, I think you want the key-store listening for blob-deletion, so it can remove key 123 when the last blob using that key is deleted.

Similarly, I think we can drop nominal-owner info from the wrapped array. Owner info will be accessable via the recipient ID (e.g. attached to an OpenPGP key or X.509 cert) where the key<->owner relationship can be signed by others. I'd be concerned about folks giving unsigned owner assertions here more weight than they deserve.

And if you want, the key-store could have its own public key, and go into the wrapped array too. Will folks really use the same session key for multiple layers? Encrypting a random session key to keys.example.com seems safer. Users not directly authorized (i.e. able to decrypt one of the wrapper payloads) would notice the keys.example.com wrapper and apply for decryption. As a bonus, this allows one key-store architecture to be "these users are authorized for all blobs", in which case it only needs to store its own key, and not maintain any layer <-> authorized-users mapping.

Also, annotation values must be strings (previous discussion starting here) you'll need to serialize to a string, mint a new descriptor property for the wrapped array, or make an encrytped media type as a separate blob:

  • manifest's layers[] descriptor points at the enctrypted blob
  • encrypted has wrapped keys in an array, and a data descriptor pointing at the encrypted layer.
  • encrypted layer

wking avatar Jul 13 '18 13:07 wking

Would we want to try an OpenPGP type of decryption of the layer first (assuming the layer is in OpenPGP format) and if this fails fall back to asking the server for the key? I am just wondering whether OpenPGP is suitable do to this or whether there's some tool implementing PKCS-7 type of messages that seem to be more suitable?

For both of OpenPGP and S/MIME, the off-the-shelf approach would be to leave the descriptor schema alone and use multipart/encrypted descriptors in the manifest's layers. Then the referenced blobs would have payloads like this for OpenPGP and this for S/MIME.

wking avatar Jul 13 '18 13:07 wking

@wking The enc.keyid_owner_account would at least reduce the possibility of a key_id collision among different users, though not completely eliminate it (per user) but the key server could refuse two distinct keys (per account) that map to the same key id. It would depend of course how long we make these key IDs for symmetric keys. RFC 4880 does seem to hint at a similar problem for their Key Ids for public keys here. Besides that it's not clear whether their should be a centralized key server that holds much information about keys (and be a high value target) or whether this server forwards requests for symmetric keys to servers that the owners are running themselves. Such request could be forwarded to the owners' server not by key id but by account name.

The above JSON was primarily meant as an example to show what information may be needed.

stefanberger avatar Jul 13 '18 13:07 stefanberger

The enc.keyid_owner_account would at least reduce the possibility of a key_id collision among different users, though not completely eliminate it (per user) but the key server could refuse two distinct keys (per account) that map to the same key id. It would depend of course how long we make these key IDs for symmetric keys.

Yeah. I think the solution to that is to use longer hashes for the IDs. I'm not surprised that RFC 4880 is warning about collisions for folks using only 8 bytes ;). But even with longer IDs, there could still be collisions. I don't think collisions are a problem though. If a collision between Alice and Bob's keys makes the target ambiguous, they can each just attempt decryption to see if the payload was really decrypted to them. There are some denial-of-service vulnerabilities in this area (asking Alice to attempt decryption of unrelated packets), but you don't avoid them with owner-string namespacing.

Besides that it's not clear whether their should be a centralized key server that holds much information about keys (and be a high value target) or whether this server forwards requests for symmetric keys to servers that the owners are running themselves. Such request could be forwarded to the owners' server not by key id but by account name.

Why not use the key ID as the account name, wherever you're keeping the (key/user)-to-access-server mapping?

wking avatar Jul 13 '18 14:07 wking

@wking Ok, so we can get rid of the account name if the keyid is sufficiently long to be unique and the central server, that would presumably somehow notify the owner of the key, refuses duplicate keyids to be registered with it. [Some sort of registration seems to be necessary.] I suppose for troubleshooting or just being able to contact an owner the central server should be able tell who the key owner is.

stefanberger avatar Jul 13 '18 14:07 stefanberger

To avoid registering bogus key IDs in the central server, one could use the static account info to contact the final server.

stefanberger avatar Jul 13 '18 14:07 stefanberger

@stefanberger, central servers and alternatives seem out of scope here (maybe they would be in-scope for the distribution spec?). Once you have a set of recipient IDs and payloads encrypted to those IDs, you can have many independent ways of actually decrypting those payloads without impacting the image format.

wking avatar Jul 13 '18 15:07 wking

On Fri, 2018-07-13 at 07:09 -0700, W. Trevor King wrote:

The enc.keyid_owner_account would at least reduce the possibility of a key_id collision among different users, though not completely eliminate it (per user) but the key server could refuse two distinct keys (per account) that map to the same key id. It would depend of course how long we make these key IDs for symmetric keys.

Yeah.  I think the solution to that is to use longer hashes for the IDs.  I'm not surprised that RFC 4880 is warning about collisions for folks using only 8 bytes ;).  But even with longer IDs, there could still be collisions.  I don't think collisions are a problem though.  If a collision between Alice and Bob's keys makes the target ambiguous, they can each just attempt decryption to see if the payload was really decrypted to them.

I think the proposal has sha256 as the keyid hash, which should not have collisions (unless it has a serious compromise).

Besides that it's not clear whether their should be a centralized key server that holds much information about keys (and be a high value target) or whether this server forwards requests for symmetric keys to servers that the owners are running themselves. Such request could be forwarded to the owners' server not by key id but by account name.

Why not use the key ID as the account name, wherever you're keeping the (key/user)-to-access-server mapping?

I really think, for an OCI proposal, key distribution should be feasible but not explicitly spelled out because I can see how you distribute the keys for the image being a significant cloud native function, and one that has to comport with all the current keystore ideas in CNCF, so we want to take an enabling but not prescriptive approach.

James

jejb avatar Jul 13 '18 15:07 jejb

... because I can see how you distribute the keys for the image being a significant cloud native function, and one that has to comport with all the current keystore ideas in CNCF, so we want to take an enabling but not prescriptive approach.

Are there CNCF decryption-API proposals? Then you could authorize decryption at request time "yes, Alice is authorized for key 123 decryptions now, so I'll pass back the decrypted session key". If instead you distribute a long-running key itself ("Alice is authorized for key 123 now, so I'll pass back its private key"), Alice will have non-revokable access to anything ever encrypted to that key.

wking avatar Jul 13 '18 15:07 wking

If we were to use OpenPGP for managing friends' public keys, then would we also want to use it for the layer encryption directly and take its encrypted output as the encrypted layer? Or only use it to manage public keys? I guess I am not clear what others' opinions are now. I don't think the format is ideal, but I am not sure whether designing our own is better. What I don't like about it is that it encodes Key IDs in the Public-Key Encrypted Session Key Packets that don't give a hint of who these keys are for. If keys are identifiable by their owners' email address, then this information should be preserved I think. The original identifiers of those keys may be useful if some day I were to build a new version of the image or add a new user to it. Those email addresses seem more user friendly than 4 byte key Ids.

The body of this packet consists of:

- A one-octet number giving the version number of the packet type.
       The currently defined value for packet version is 3.

     - An eight-octet number that gives the Key ID of the public key to
       which the session key is encrypted.  If the session key is
       encrypted to a subkey, then the Key ID of this subkey is used
       here instead of the Key ID of the primary key.

     - A one-octet number giving the public-key algorithm used.

     - A string of octets that is the encrypted session key.  This
       string takes up the remainder of the packet, and its contents are
       dependent on the public-key algorithm used.

stefanberger avatar Jul 13 '18 16:07 stefanberger

What I don't like about it is that it encodes Key IDs in the Public-Key Encrypted Session Key Packets that don't give a hint of who these keys are for.

That information is associated with the public key, which you can retrieve by using the key ID. This is very similar to X.509, where you have a private and public key, as well as a separate certificate asserting the identity of the private-key-holder. What do you gain by embedding that metadata (as an unsigned assertion) in the recipient list? If you want to resolve that metadata, you should use the key ID to retrieve metadata which has been signed by parties you trust (e.g. in your web of trust, your trust-on-first-use database, a shared certificate authority, etc.). For example:

$ gpg --search-keys 0xBB729EC7
gpg: searching for "0xBB729EC7" from hkp server keys.gnupg.net
(1)	CoreOS Application Signing Key <[email protected]>
	  4096 bit RSA key FC8A365E, created: 2016-03-02, expires: 2021-03-01
Keys 1-1 of 1 for "0xBB729EC7".  Enter number(s), N)ext, or Q)uit > q

The analogous PKCS#7 object is similar in just recording the IssuerAndSerialNumber:

RecipientInfo ::= SEQUENCE {
     version Version,
     issuerAndSerialNumber IssuerAndSerialNumber,
     keyEncryptionAlgorithm

       KeyEncryptionAlgorithmIdentifier,
     encryptedKey EncryptedKey }

so you'd have to use that to lookup the cert if you wanted to find the recipient's name. A difference between the PKCS#7 approach and the OpenPGP approach is that the former only supports one party (the issuer) for asserting metadata (like the recipient's name or domain name), while OpenPGP references the key itself, and allows multiple parties to assert metadata associated with that key.

wking avatar Jul 13 '18 16:07 wking

i like the OpenPGP spec. It seems to be an easy option for users to share keys without a trusted certificate hierarchy. Local users that want to use encrypted containers can use a combination of gpg and docker to run encrypted images (by utilizing the gpg keychain).

In addition, it is convenient that there exist a golang library that implements OpenPGP :) :).

The data layer would be the ciphertext packet and the wrapped keys packets would be the enc.keys array in the annotations.

I will update start updating the original proposal again to note some of the ideas/comments in the discussions.

For the scenario raised where we need to pass keys to new users, @stefanberger and I have discussed a design with the "Fully Centralized Untrusted Key Server" which will work with the Open PGP model. We have the wrapped keys associated with the image be the trusted parties and have the trusted parties be able to re-wrap the keys.

lumjjb avatar Jul 16 '18 03:07 lumjjb

Updated proposal. In addition, @stefanberger, @estesp and I are looking into implementation details with containerd and possibly buildkit.

lumjjb avatar Jul 17 '18 22:07 lumjjb

Updated proposal.

Looks like you have stale enc.algo and enc.keyid references now that the meat is all under org.opencontainers.image.enc.keys.

Also, as I mentioned earlier, annotation values must be strings, so the proposal graphic should either move ghe keys property out of annotations or convert the value to a string with something like:

"annotations": {
  "org.opencontainers.image.enc.keys": "[\"wrapping1\",\"wrapping2\"]"
}

And there seems to be a dangling "registered with a Key ID to the organization namespace" paragraph fragment. Maybe leftover from a partial edit?

wking avatar Jul 17 '18 22:07 wking

Thanks @wking ! I have made the changes!

Also, as I mentioned earlier, annotation values must be strings, so the proposal graphic should either move ghe keys property out of annotations or convert the value to a string with something like:

made the keys comma-delimetered base64 strings.

lumjjb avatar Jul 18 '18 13:07 lumjjb