spire
spire copied to clipboard
TPM Node Attestation Design
Following up with @evan2645 and @APTy:
Plan for attestation: For attestation, I was thinking of something very similar to how x509pop works, but without the intermediate certificates. Agent requests to attest (providing its public key), server checks if it has that public key recorded, server sends a nonce encoded with the node tpm's public key, and the agent responds with the nonce decoded.
Plan for getting public keys into the server: For this, I think we could add a table into the spire database with a row for each public key. Then, the user could put public keys into the database either with the spire cli or the api.
I'd love to hear others' thoughts on these plans! These plans are x509-specific, but we could also talk about verifying on other data inside the TPM.
It is dangerous to not have the agent also contribute to the nonce, otherwise the agent essentially becomes a decryption oracle to a rogue server. Restrictions on nonce can mitigate this but really both sides need to contribute to the material.
The x509pop attestor accomplished this by having the server send a nonce. The agent generated its own nonce, combined it with the server nonce, and signed it with the private key. It then sent both the agent nonce and the signed value to the server. The server combined the agent nonce with its own nonce in the same manner and verified the signature with the public key.
re: getting the keys into the server
We've noodled a bit with an interface that gives plugins access to a namespaced key value store riding on top of the datastore and well as an interface through the registration API for plugins to be receive custom payloads. In this way, the CLI (via the registration API) could direct public keys into the plugin, which could store them back to its namespace in the data store. During attestation it could check the namespace for the pubkey.
I'm curious what kind of SPIFFE ID this selector would produce? Maybe one with a hash of the public key? Are there any selectors we could derive (maybe via metadata provided along with the public key)?
Any thoughts on using a TPM quote instead? Server sends nonce, TPM generates quote with the nonce and sends it back, server validates the quote using the TPMs public key?
I like the TPM quote idea, and generation and verification are already built into existing TPM libraries. Should mean that the actual verification logic won't be too hard to write.
As far as the autogenerated ID goes, it could be something like the join token where if no spiffeID is specified when the key is entered, the hash of the public key is used. Otherwise, it uses the specified spiffeID. We could also have the node provide data like a FQDN on attestation. In that case, though, I'd be concerned about collisions between nodes.
Is that interface for the key value store published anywhere? That sounds like the way to go for storing keys to me.
Unfortunately, by "noodled" (poor word choice on my part), I meant that we've discussed (but not implemented) such an interface.
Ah gotcha. Are there any other ideas on how to store public keys? If not, this is blocked until that interface gets implemented.
an interface that gives plugins access to a namespaced key value store riding on top of the datastore and well as an interface through the registration API for plugins to be receive custom payloads. In this way, the CLI (via the registration API) could direct public keys into the plugin, which could store them back to its namespace in the data store.
This is pretty far from being a reality at the moment. I'd suggest we not block this work on it being available since it will take some time to get in (we don't really even have a design doc or anything yet for it).
We might want to start with something simple, like pointing to a flat file with a list of TPM information? We can reload this file periodically without having to restart SPIRE server.
@Pwpon500 how would you like to see this configured, ideally? It would help if we could understand a little bit more about where this information comes from and how. For instance, how often is it updated?
I like the TPM quote idea, and generation and verification are already built into existing TPM libraries. Should mean that the actual verification logic won't be too hard to write.
Any thoughts on asserting register state here? Perhaps optionally?
it could be something like the join token where if no spiffeID is specified when the key is entered, the hash of the public key is used. Otherwise, it uses the specified spiffeID.
This SPIFFE ID would need to be bound to the TPM - is this information that we would know in advance? The SPIFFE ID that a serial number should eventually take on? From a configuration perspective, it seems like the easiest thing to do would be to bulk load a list of hardware you know to be yours. I guess this is related to the above question about use - at what point would you get to know that hardware X is destined for location/purpose Y?
A SPIFFE ID that included the serial number of the TPM is another option. That would move the SPIFFE ID assignment time from configuration to registration API call.
This information would need to updated whenever a new machine needs to be provisioned. This is relatively frequent - at least a few times an hour. This is why I'm leaning towards a datastore option that can be easily added to. Instead of a flat file, we could use a directory that the spire server has read/write access to. Each public key would be stored as a .pub file. Then, the spire server could easily add and remove keys from the directory. It seems to me that this would be easier than having a flat file that the spire server has to append to and remove from.
This information would be registered by whatever human is starting the provision for the box, probably through a tool that registers the information through the spire api.
For registering state, if the agent registers with a public key that's already been registered, I think it makes the most sense to just hand out the SVID for the already-active registration to the agent. We could also add an option to evict the old registration and make a new one or reject the registration request. These wouldn't play nice if a box has to be wiped and brought back up, however.
Because of the frequency with which new boxes are provisioned, I don't think a bulk load would work. I do like the idea of using the serial number for the TPM in the SPIFFE ID, though. It would definitely take configuration load off the person provisioning the box and make the SPIFFE IDs more easily predictable.
This information would need to updated whenever a new machine needs to be provisioned. This is relatively frequent - at least a few times an hour. ... Instead of a flat file, we could use a directory that the spire server has read/write access to. Each public key would be stored as a .pub file. Then, the spire server could easily add and remove keys from the directory. It seems to me that this would be easier than having a flat file that the spire server has to append to and remove from.
Sure, that works. One thing to keep in mind is if you have more than one SPIRE server, file-based persistence may complicate things. Unfortunately, until we have a persistence service we can expose to plugins, I can't think of anything better. We will be working in this direction, but it is likely major versions and months (not weeks) away.
This information would be registered by whatever human is starting the provision for the box, probably through a tool that registers the information through the spire api.
It will be hard to get this wired through an official SPIRE API the way we are currently set up. This might be possible in the future with the addition of new plugin features, but it will be quite some time.
One option is to prototype this as an out-of-process plugin that exposes its own API. It can be configured with a port number and a list of SPIFFE IDs authorized to access it. SPIRE server offers a service to plugins that allows them to retrieve SVID and bundle information. This would provide the plugin with everything it needs to start its own API and secure it (as well as authenticate its clients). Combined with file-based public key discovery, I think this ought to cover you for now?
For registering state, if the agent registers with a public key that's already been registered, I think it makes the most sense to just hand out the SVID for the already-active registration to the agent. We could also add an option to evict the old registration and make a new one or reject the registration request. These wouldn't play nice if a box has to be wiped and brought back up, however.
So long as access to the TPM is controlled and restricted to only privileged users, I think it is safe to re-attest. IIUC, TPM access is restricted to root by default, however multiplexing agents may expose APIs that are accessible to unprivileged users - is this very common? If so, perhaps re-attestation is disabled by default and enabled by a configurable.
One thing that hasn't been explicitly stated but I believe has been alluded to and would benefit from an explicit statement (to remove any ambiguity) is the following:
The initial registration of a TPM's public key must be an out-of-band action that cannot be automatically done from the machine itself. That is, when a new machine is provisioned from bare metal it cannot itself call into this new API to register its TPM's public key.
A TPM would be untrustable by SPIRE until its public key exists in the public key store so the untrusted machine cannot be the provider of a key to then trust. This is not to say that an automated provisioning process cannot register public keys via the API but only that the source of the TPM <--> public key
mapping will need to be some alternative trusted entity used during that provisioning process.
AFAIK, any unprivileged user being able to access the TPM is very uncommon, so I think it's safe to re-attest by default and optionally disable re-attestation.
That plan sounds like it'll cover things for now. Could you point me to the service you're referring to?
The initial registration of a TPM's public key must be an out-of-band action that cannot be automatically done from the machine itself. That is, when a new machine is provisioned from bare metal it cannot itself call into this new API to register its TPM's public key.
Yes, I believe that has been the intention thus far, thank you for calling it out explicitly.
Seeing it called out provoked one further thought on my part: currently, most of the cloud-based attestors do not require each machine to be registered beforehand. Instead, the attestor is generally configured with an account number, and we assert that the machine belongs to one of the configured accounts. This lowers the overhead - we automatically issue agents an identity equal to the identity of the cloud instance.
There may be some convenience in a similar function here, but I'm not sure if/how it may be done securely (perhaps if there was some sort of customer-level key provisioning? AIK-based certificate? I really don't know). I will note that in these cases, the agents can receive an identity in the trust domain, but further identity issuance must still be accomplished via creation of a registration entry. Agent identities are always issued in the spiffe://example.org/spire/agent/*
namespace. Anyways, I just figured it was worth a mention, didn't meant to take this conversation off the rails.
AFAIK, any unprivileged user being able to access the TPM is very uncommon, so I think it's safe to re-attest by default and optionally disable re-attestation.
Sounds good to me. Are you accessing the TPM hardware directly or are you using software on top of it to control access?
That plan sounds like it'll cover things for now. Could you point me to the service you're referring to?
Here is the proto that SPIRE server exposes to its plugins, and here is an example of it being consumed.
@azdagron any idea if we need to do anything special to expose this host service to node attestor plugins?
Had a conversation today with @Pwpon500 on this topic.
The current thinking is that the server side node attestor plugin will be configured with the CA certificates needed for validating an EK cert. The quote will be verified using the key in the EK cert, and the EK cert will be verified using the configured certificates. This should be enough for us to assert the identity of the TPM hardware being attested. The resulting SPIFFE ID would reflect that (e.g. by referencing manufacturer, serial number, or public key etc).
Taking this approach negates the need for a plugin-specific API for registering public keys, and works well with the currently established node attestor patterns. Further use would be made through the creation of registration entries, per typical SPIRE interaction.
@Pwpon500 please correct me here if I mangled any of this.
Some open questions floating around my mind:
- How should the resulting SPIFFE ID be formed? It should be "predictable" (i.e. given a serial number or a public key or something I am likely to have, I can build the SPIFFE ID for that hardware)
- What additional selectors (if any) should be generated by this process?
That all looks good to me.
As far as SPIFFE ID goes, I'm hesitant supporting something like manufacturer or serial number unless there's a way to consistently get that information out of the TPM. I can't find it in the TPM spec: https://trustedcomputinggroup.org/wp-content/uploads/TCG-TPM-v2.0-Provisioning-Guidance-Published-v1r1.pdf My current implementation uses a sha256sum of the public key for the ID, but I'm not attached to that implementation. The SPIFFE IDs get pretty ugly. If someone knows a go library that can get manufacturer/serial number information from TPMs, please do chime in.
For selectors, one option would be to look at data inside the EK cert. Since it's signed by the CA, we know that data to be valid. Selectors could be created based off data inside the cert (SANs, issuer, subject, etc.).
Actually, after looking at an EK cert a little bit more, I realized we could use the serial number field in the cert. We'd have to combine this with the name of the CA since different CAs can hand out certs with the same serial number. One option for implementing this is:
User configures a map of CA name to bundle path:
NodeAttestor "tpm" {
plugin_cmd = "/opt/spire/.data/tpm_attestor"
plugin_data {
ca_files = {
"provider_1": "/path/to/bundle1.pem",
"provider_2": "/path/to/bundle2.pem",
}
}
}
Then, if a node attests with a cert that matches provider 1 and the cert's serial number if 1337, the SPIFFE ID spiffe://example.org/agent/tpm/provider_1-1337
would be generated.
I don't like that we're requiring the user to name their CAs, but I don't really see any other good way to name CAs. I think the only CA-unique identifier is the CN, but that is way too long (and contains spaces) to put in the SPIFFE ID.
Thoughts on this?
As far as SPIFFE ID goes, I'm hesitant supporting something like manufacturer or serial number unless there's a way to consistently get that information out of the TPM.
I had originally suggested something along these lines because I would expect an operator to have that information and thus be able to easily form or predict the SPIFFE ID for a particular piece of hardware. At the end of the day, an operator will have to register something and refer to this node somehow... the easier we can make it to refer to/address the agent's SPIFFE ID, the better. Another thing here too is that we have some audit logging which emits the SPIFFE ID of the agent when requesting SVIDs etc, so being able to back your way into a specific node or instance based on the agent's SPIFFE ID is also valuable.
Actually, after looking at an EK cert a little bit more, I realized we could use the serial number field in the cert. We'd have to combine this with the name of the CA since different CAs can hand out certs with the same serial number ... I don't like that we're requiring the user to name their CAs, but I don't really see any other good way to name CAs. I think the only CA-unique identifier is the CN, but that is way too long (and contains spaces) to put in the SPIFFE ID.
Yea, I feel the same... Serial number can also be long and random, and it might still be hard to associate the SPIFFE ID to a particular piece of hardware? I think you would know better than I would on this point - I'm not sure what sort of information is normally kept in inventory systems (I assumed serial number or something should be pretty standard hence that being my original suggestion).
Are there any hard requirements about exactly what kind of information MUST be included in an EK cert? I skimmed through the spec you linked but didn't see anything along those lines. For instance, if we find that the EK cert from manufacturer X includes some useful information that we can use to form the SPIFFE ID, can we reasonably expect the same information to be present on EK certs from other manufacturers?
Unfortunately, there isn't any requirement for EK certs to have any useful identifying information beyond typical x509 certificate attributes. The certs I'm finding when I do testing are pretty barebones.
When talking to Phil Vachon, he mentioned that the readability of original SPIFFE ID may not actually be all that relevant if it's being mapped to a more well-known name i.e. the original ID being something like spiffe://example.org/spire/agent/tpm/hash/NT2IZHvCqVg+5PigPNqp3ySfXrCUomqgRIYW/27ZmH0=
if it can be automatically mapped to something like spiffe://example.org/spire/agent/tpm/name/us-east-server-1
using a user-provided post-register hook of some sort.
In this case, there would be two paths for users:
- Use the tpm plugin standalone and use the hash-based SPIFFE ID directly
- Use the tpm plugin and provide another plugin that reaches out to some inventory system and creates another SPIFFE ID with a more well-known name.
Also, if the sha256sum is too long, we can truncate it to 16 bytes.
Hello! :wave:
Some folks and I are interested in reactivating this discussion and happy to contribute to building a native implementation of TPM node attestation in SPIRE. It would be valuable for SPIRE to natively support some form of node attestation backed by these devices to provide identities with hardware-based cryptographic guarantees.
We have been reading this thread and doing some investigation and analysis. We’d like to propose an approach following the TCG draft just published “TPM 2.0 Keys for Device Identity and Attestation” [1] (under review) that applies the “IEEE Standard for Local and Metropolitan Area Networks, Secure Device Identity (802.1AR)”[2] device identity module definition and formatting to keys protected by a TPM 2. This new approach can also include the idea outlined so far based on EK certificate when DevIDs are not available. I’ll add the proposal below.
(Co-authored with: @adriane-cardozo and @langbeck)
Background
The IEEE 802.1 working group defined the IEEE 802.1AR Standard [2] for Local and Metropolitan Area Networks Secure Device Identity. This standard defines a secure device identifier (DevID) as “a cryptographic identity that is bound to a device and used to assert the device’s identity”. The initially installed identity is defined as an IDevID (“I” for initial) and subsequently locally defined identities are LDevIDs (“L” for local). An IDevID will be created at manufacturing time and provides proof that this device has been manufactured by a certain manufacturer - it is intended to be usable for the life of the product. An LDevID is created on the Administrator’s premises and provides proof that this device is owned by a certain enterprise (or individual) - it is not expected to be a long-lived certificate.
A Trusted Platform Module, also known as a TPM, is a cryptographic coprocessor that is present on most commercial PCs and servers. It was designed by a computer industry consortium called Trusted Computing Group (TCG) as a dedicated microcontroller to secure the hosting platform, providing protection (via physical isolation with the CPU/memory) to cryptographic keys for example. TPM is becoming a strategic asset for computer owners to defend their cryptographic assets.
The IEEE 802.1AR Standard can be used together with TPM-based keys and certificates as described in [1]. It applies the IEEE Standard 802.1AR device identity module definition and formatting to keys protected by a TPM 2.0. It addresses ways to incorporate TPM 2 created keys into solutions that protect device identities and help prevent a “lying endpoint”. TPM is a secure Root of Trust for Storage (RTS) that provides protection for private keys, preventing use of keys from one device on another device or with another TPM. The security of the provisioning of DevID keys is anchored in the Endorsement Key (EK) and its certificate (issued by the TPM manufacturer).
Related work
Some Interesting work has been done in this field. In particular, the external node attestor plugin developed by bloomberg [3].
This plugin makes use of the EK certificate (the certificate associated with the TPM’s EK) to prove the identity of the TPM device. The agent generates an AK (attestation key) using TPM and sends the EK certificate and AK attestation parameters to the server, which verifies the certificate using the manufacturer CA and issues a challenge using the EK public key and AK attestation parameters. Then, the agent solves the challenge using the TPM interface to prove the possession of the private key. Finally, the server builds the SPIFFE ID and one selector using the hash of the public EK.
The authors of this proposal are not aware of other public TPM plugins at the time of writing this document.
Proposal
We propose to leverage DevID certificates (LDevID or IDevID) to authenticate each device, and it also works with other certificates which private keys are generated and secured by a TPM. This proposal looks into a generalized method that does not restrict the attestation to the certificate provided by the TPM manufacturer but potentially allows the agent to use a certificate minted by the organization CA. In this way, the operator would have more control not only on the certificate signature but also on the values spawned in selectors.
To achieve that goal, we could take advantage of the TPM secure key generation feature. TPMs can securely generate new cryptographic keys: those keys are only available to the TPM (private key material never leaves the device in plain form).
Protection of the generated keys are rooted in a primary key created by the owner of the TPM. When a key leaves the TPM (in order to be loaded and used later) it is wrapped (encrypted) by its parent key.
The public keys can be included in custom certificates signed by the CA of preference instead of relying on the manufacturer certificate and CA.
With that in mind, a similar approach to the one used in x509pop attestor can be taken, but where the private key associated with a certificate is generated and secured by the TPM, following security guidelines and templates recommended in [1]. By using those templates we are leveraging the security analysis done by most of the highest TPM expert in the world. A potential workflow is presented below:
[A] Preconditions We are following [1] for DevID creation. The TPM’s end-user must first take ownership of the TPM. Essentially, this process consists in:
-
Creating a primary encryption key on the TPM owner hierarchy. This key is known as Storage Root Key (SRK).
-
Creating a signing key under the SRK.
-
Creating a certificate signing request using the public signing key, and ask an internal, trusted CA to sign it. This process also uses and verifies the EK certificate.
At the end of this process, the end-user has:
-
A certificate including the public signing key, signed by its internal CA.
-
The private key, on a persistent storage encrypted by the SRK.
[B] Proposed attestation workflow
-
Agent loads the certificate and encrypted private key from disk to memory.
-
Agent creates an SRK into the TPM
-
Agent loads the encrypted private key from memory into the TPM under SRK.
-
Agent sends the certificate to the server.
-
Server verifies the certificate using the CA bundle and sends a challenge (nonce) to the agent.
-
Agent solves the challenge making the TPM to sign the nonce (agent also contributes to the nonce).
-
Server verifies the signature.
[C] Agent SPIFFE ID and selectors
The proposed agent SPIFFE ID is based in the certificate fingerprint. It provides uniqueness and consistency with other existent node attestors (x509pop, sshpop):
spiffe://<trust domain>/spire/agent/tpm2/<fingerprint>
A good number of selectors can be extracted from the certificate fields. Some of them can be: the certificate serial number, subject attributes and issuer attributes.
tpm2:serialnumber:<serial-number>
tpm2:subject:cn:<subject-common-name>
tpm2:issuer:cn:<issuer-common-name>
There is a tradeoff using IDevIDs or LDevIDs and other custom certificates. Leveraging IDevIDs means we leverage platform manufacturer PKI but use as selectors only the manufacturer information present in the certificate, while using LDevIDs or other custom certificates means the customer needs to run and protect its own PKI but can customize the certificate attributes as required thus creating additional selectors.
[D] Combined approach
It can be possible to also allow attestation using the endorsement certificate and the manufacturer CA as described in the discussion going on so far and in the “Related Work” section. Even though it is a less flexible approach, it does not require the TPM to be previously provisioned by the owner and could be handy in some cases.
One way to accomplish this combined design is to check whether a certificate path is provided in the plugin configuration or not. If it is not provided, the plugin could attest using the TPM’s endorsement certificate.
Another option is to create two different node attestors, one for attestation using EK certificates, and another one for attestation using custom certificates.
Request for comments
This proposal tries to lay out a possible approach to provide native support to node attestation using TPM devices. Any feedback on the general direction of this proposal, any missing points, suggestions, or thoughts in general is greatly appreciated.
References
[1] TPM 2.0 Keys for Device Identity and Attestation
[2] IEEE Standard for Local and Metropolitan Area Networks - Secure Device Identity
We got some interesting feedback in the last SIG-SPIRE meeting. I believe it can be summarized in two main points. I’ll write them bellow to follow up the discussion:
-
Consider exposing PCR values as node selectors.
-
Consider using a challenge that ensures the key is coming from a real TPM.
I have been talking with @adriane-cardozo and @langbeck about this. Some thoughts we have are: [Point 1] Since it is not possible to quote using DevIDs (which are unrestricted signing keys), we would have to load an AK credential (certificate + key) from disk in addition to the DevID. However, we could make it optional:
- If we have an AK credential configured, we read and quote PCRs (TPM2_CC_PCR_Read, TPM2_CC_Quote).
- If we don’t have an AK credential, just read the PCR values from the TPM using TPM2_CC_PCR_Read. This returns the PCR values, but with no security guarantees (TPM does not sign the returned data). An attacker could MITM the communication with the TPM and forge arbitrary responses.
We should have a way to differentiate the exposed PCRs selectors names (verified or not). For example:
pcr:<pcr-number>:<pcr-bank>:<pcr-value> (verified: read + quote)
upcr:<pcr-number>:<pcr-bank>:<pcr-value> (unverified: read)
[Point 2] The provisioning process of DevIDs already involves checking the EK certificate and issuing a challenge that can only be solved by the TPM that holds the EK. This is called “TPM-residency assessment” by the DevID spec.
The local CA (DevID provisioning service) is the one doing the residency assessment and the plugin doesn’t need to worry about that. DevID spec shifts this verification to the key provisioning phase so, as long as you trust that your provisioning tool hasn’t been compromised, you can trust that challenges signed by keys provisioned by your infrastructure do actually reside on a real TPM.
We made a fork of the Bloomberg TPM plugin. We added support to issue SVIDS based on TPM Public Key Hash. This seems to work very well in our use case. More importantly, we added a full test suite with a simulated TPM.
The plugin uses TPM credential activation as the method of attestation. The plugin operates as follows:
- Agent generates AK (attestation key) using TPM
- Agent sends the AK attestation parameters and EK certificate or public key to the server
- Server inspects EK certificate or public key
- If hash_path exists, and the public key hash matches filename in hash_path, validation passes
- If ca_path exists, and the EK certificate was signed by any chain in ca_path, validation passes
- If validation passed, the server generates a credential activation challenge using
- The EK public key
- The AK attestation parameters
- Server sends a challenge to agent
- Agent decrypts the challenge's secret
- Agent sends back decrypted secret
- Server verifies that the decrypted secret is the same it used to build the challenge
- Server creates a SPIFFE ID in the form of spiffe://<trust_domain>/agent/tpm/<sha256sum_of_tpm_pubkey>
- All done!
We would be happy to discuss working with the SPIRE team and Bloomberg to upstream this fork.
In our real world use of the BoxBoat Fork, we found that not all TPMs ship with an EK Certificate.
If I recall correctly, some Intel TPMs return a URL and require internet access to download their EK Certificate, which is a non-starter in air-gapped networks.
Using EK Public Key Hash was far more reliable, although it does require building an Allow List of every authorized TPM on the Spire Server.
Hey folks - I've been talking to @colek42 recently about reviving this effort to upstream the TPM plugin. He has been using it now for quite some time, and his team has made some interesting discoveries like the one mentioned here we found that not all TPMs ship with an EK Certificate.
Since this plugin has already had some miles put on it, I don't think we need to spend too much time in this issue talking about the implementation (we can save that for PR?). Instead, I think it will be useful to get consensus here on the experience and feature set. Below are a few things I can think of - anything else that comes to mind for you all?
-
PCR Selectors: IMO the most powerful aspect of a TPM is its ability to measure boot state. As we look towards something like https://github.com/spiffe/spire/issues/2203, this can become a very strong assurance we can make during SVID issuance, and one that we can enforce "continuously". I think a simple way to expose this could be for the server plugin to emit each PCR as a selector.
-
Public Key Allowlist: @colek42 and his team have found a strong need to build and manage a TPM pub key allow list. My understanding of this need is that it arises from the fact that EK cert is not available in many cases. Without EK cert, I cannot think of any way to onboard a new agent safely, because we cannot prove that the TPM is authentic.
Currently, I think they are using a file-based mechanism wherein the plugin watches the filesystem for updates that add or remove authorized pub keys. Since this is presumably an operation that happens frequently, plugin config requiring server restart is probably a nonstarter. Filesystem watch isn't my favorite thing in the world, but I also don't have any better options. Perhaps the ability to specify a TLS-protected endpoint from which the plugin can periodically fetch an allow list? That will work well in k8s-like environments and also allow for central update of the list ... on the other hand, it must be hosted somewhere :) @azdagron @amartinezfayo any thoughts on this one?
-
Fetching EK Cert when not present: @caleblloyd reports that when EK cert is not present, the TPM instead returns a URL from which the EK cert can be fetched. I assume this URL is protected with web PKI? It probably makes sense for many users if this plugin could go fetch the referenced EK cert for validation, eliminating the need to manage pubkey allowlist. I realize that this is not always possible, so this behavior can only be triggered if the plugin is not configured for file-based allowlist... thoughts?
I'd love to hear what everyone thinks about the above, and if there are any other special points of consideration regarding how folks will use this plugin and what its features should be. I think once we get some rough consensus on the above points, especially on how we manage pub key allowlists, we can go to PR.
@colek42 can you help us understand a little more about the following part?
Currently, I think they are using a file-based mechanism wherein the plugin watches the filesystem for updates that add or remove authorized pub keys.
Approximately how many keys are we needing to track here? Order of magnitude is probably good enough.
I think you mentioned something about efficient file-based lookup as well? Can you share a bit about how this works and some of the design decisions made there?
Public Key Allowlist
https://github.com/boxboat/spire-tpm-plugin#hash-directory
The authorized hashes are stored as empty files, where the filename is the authorized hash. The server plugin just calls stat
on the filename to see TPM Hash is authorized. There is no File System watching or Server Reloading involved. And the filesystem backing this hash directory can be anything - NFS, Kubernetes ConfigMap/Secret, Git Repo pulled in a loop, etc
https://github.com/boxboat/spire-tpm-plugin/blob/f2f64a46bd54dcf6937e88dcc8c09da644989121/pkg/server/server.go#L136-L141
Manufacturer Certificate Directory
I didn't find this method all that useful - if I authorize Intel's Root Cert then anyone can go out and buy box with an Intel TPM and run it on my network, and they'll be able to attest a rogue node.
The disk-path based solution isn't my favorite, but I am struggling to come up with something better all around. It is a very simple approach and does have a generous amount of flexibility. We could augment the attestor at a later point if something more palatable was discovered; I don't think it's worth holding the show up.
One observation is that some filesystems (virtual or otherwise) have poor performance on large directories. We may consider distributing the hashes into prefix buckets (00, 01, .... ff).
I liked the idea mentioned of supporting an JSON based HTTP endpoint which responds with a list of pub keys.
It could be as simple as an Nginx container that's mounted a ConfigMap containing the list of pub keys in the API JSON format. This lets you 'build' the list of pub keys and ship it as an artifact (similar to the file watcher idea). Or it could be more complex like a custom app that queries an inventory system to render out the response.
This issue is stale because it has been open for 365 days with no activity.
This issue was closed because it has been inactive for 30 days since being marked as stale.