RFC: Workload attestor fatal errors
At present if an attestor encounters an error while attesting a workload it is treated as if it returned 0 selectors. This behaviour can be harmful as it results in a workload failing to be granted its full set of identities. There are, however, cases where it might be desirable. This proposal explores whether it is possible to differentiate between these two cases.
Assume a configuration in which a node runs a mix of docker and non-docker workloads. The spire-agent is configured with the docker attestor and the unix attestor.
If workload A is a non-docker workload, then attestation should succeed even if the docker daemon is unavailable. This is the case today as even if the docker attestor attempts to connect to dockerd the error will be treated as if no selectors were returned.
However, if workload B is a docker workload, then today attestation will succeed even if the agent receives an error while attempting to communicate with dockerd. Workload B will end up with no, or a partial (if the unix attestor produces useful selectors for it) set of, identities.
Why is this a problem?
The Workload API disconnects a client that has 0 identities, but this does not protect a client that expects multiple identities granted via selectors from different attestors. It may still receive >0 identities, just not all that it needed.
The SDS API is not able to disconnect a client in the same way, as a client with 0 identities may still retrieve the trust bundle. This leads to SDS clients being “stuck”, waiting for identities on a connection where attestation was only partially successful. #2020 disconnects an SDS client that specifically requests an identity it doesn’t have access to, but it is easy to imagine cases where this would be undesirable: For example, if a newly created registration (or federated bundle) has not yet propagated to the agent then envoy will be disconnected on asking for it, even if it would have been possible for envoy to begin operation with the other identities/bundle requested in the same SDS request (SDS will send the new one eventually once it propagates.)
What can we do differently?
It may be possible to differentiate between conditions where an attestor failed because the workload is unsuitable for the attestor, and conditions where it failed on a suitable workload due to an unexpected error. In the former case the result is an empty list of selectors, but in the latter case it seems reasonable to abort attestation with an error and disconnect the client.
Both the docker and k8s workload attestors have a clear way to recognise that a given process should be of interest to them:
A docker workload will be in a cgroup with a name starting docker
A k8s workload will be in a cgroup with a name containing kubepod
(actual patterns matched are even more specific.)
So, for example: If the k8s attestor is configured, and if a process is in a cgroup matching the kubepod pattern, then a failure to find a matching pod in the kubelet API after max attempts should be an error. The attestor is certain that the workload is a kubernetes container (no non-k8s process would reasonably be in a kubepod named cgroup), but it is not possible to map this to a kubernetes pod.
If the process is not in a matching cgroup then, as today, it is not an error, just an empty AttestResponse.
There are, however, likely to be error conditions that do justify logging a warning, but that should not cause attestation to fail. One example is perhaps failure when attempting to open "/proc/%v/cgroup" on a system that does not support cgroups? The docker/k8s attestors cannot possibly function on such a system, and so it seems that a log message might be warranted, but attestation should not fail as the process is not unambiguously the responsibility of either attestor.
Implementation
The simplest implementation would be to make any errors returned by workload attestors fatal to the attestation process. If an attestor encounters a “soft” error condition, then it should log, and return an empty AttestResponse, but not an error.
Existing attestors would need updating to only return errors where they strongly believe that attestation would have reasonably succeeded had the error not been encountered (e.g. kubepod cgroup, but unable to lookup the relevant pod.)
This change would be easy to make for the in-tree attestors, but may cause problems for anyone who has out of tree plugins.
An alternative would be to add a new error type to allow an attestor to return a fatal error. By default, errors returned from the attestors would be handled as today, but an error marked as fatal would immediately end the attestation process and disconnect the client. This would avoid needing to update all existing plugins, but at the cost of slightly ambiguous interface.
Alternatives
Some ability to configure attestors as “required”. Any error returned by a "required" plugin would abort workload attestation. This solves the problem for a typical k8s node where k8s is the only relevant attestor, but in the docker+unix mixed workload case it could be undesirable to mark docker as required.
Your ideas go here:
Thank you for putting some thought into this @JonathanO - I think we're long overdue for defining a behavioral contract between agent core and workload attestor plugins.
The behavior you're describing sounds reasonable. It is convenient that for both docker and k8s we can get a strong signal that we should be able to attest a particular workload - I wonder if that will broadly be the case though. Relatedly, could there ever be a situation in which a k8s workload was registered with non-k8s selectors?
In reading through this proposal and thinking a little more about the actual problem, I don't think it's limited to SDS. The SPIFFE Workload API will hang up on a caller with no identities, but a caller with multiple identities may still receive only a partial set if a workload attestor encountered a problem along the way. Similar to the Envoy case, this caller would never receive a subsequent update with the full set and be "stuck".
Your ideas go here:
While I do think your proposed behavior is reasonable, I can't help but wonder "why not return everything we possibly can?". The k8s-workload-with-no-k8s-selectors case comes to mind here. Since the real problem seems to boil down to callers getting "stuck" after transient attestor failures, I wonder if one alternative solution could be to finally bite the bullet on re-attestation... If SPIRE Agent periodically re-attests workloads, our stuck caller would presumably become unstuck as soon as we were able to successfully attest it.
This issue is stale because it has been open for 365 days with no activity.
This issue was closed because it has been inactive for 30 days since being marked as stale.