policy-server
policy-server copied to clipboard
policy-server seems to pull cached OCI artifacts regardless of hot cache
Deploying Kubewarden with Audit Scanner enabled, and configured to run every 2 minutes, Deploying verify-image-signatures policy configured to verify Application Collection images following the instructions in https://github.com/kubewarden/docs/pull/443,
It seems that the PolicyServer still exercises the OCI registry instead of consuming from its cache, when calling: https://github.com/kubewarden/policy-evaluator/blob/3cd66b932b199037e677e3e204d4d9742e23edc8/src/callback_handler/sigstore_verification.rs#L251-L266
Acceptance criteria
Verify that policy-server cache for context-aware calls is correctly configured. Configure the cache in policy-evaluator as needed. Add tests as needed.
I've tested the code. Everything is working as described:
- the results obtained from the registry are cached for 1 minute
- only successful results are cached
If a container image is not signed, getting its signature will fail. Hence whenever a workload uses an unsigned image we will keep reaching to the remote registry until a signature blob is found.
In the setup described above, the audit-scanner performs an assessment every 2 minutes. That means the cache is always empty when the scanner is initiated. However, if multiple workloads are using the same image, the remote registry is interrogated only once. However, don't forget the cache is specific to the policy-server instance. When running multiple policy server instances, each one of them will reach out to the registry for the same image; but each one will do that only once.
We could provide a configuration knob that sets the cache expiration time.
@recena: do you have any opinion? I know the potential bug was reported by you.
Moving to blocked, waiting for feedback
I'm not sure If I understand the scenario, but:
- We should cache valid images → signed
- 1 minute for TTL is too short
We're caching the valid images, but we expire the cache after 1 minute. That's because someone in the meantime might overwrite a tag.
For example:
- Time
10:00:00: we verifyregistry.local/nginx:1_alpine, we fetch data about it from the registry. This information is cached forXminutes - Time
10:02:00: someone overwritesregistry.local/nginx:1_alpine - Time
10:05:00: we have to verifyregistry.local/nginx:1_alpineagain. If the cache is expired everything will be fine, otherwise we will reuse the details about the original image that was around at10:00:00; which might be bad
Right now we're conservative, being a security project, and we let the cache expire after 1 minute.
I think we should allow the user to configure the cache expiration time. In this way the user could define a value that is the right tradeoff between the two cases (talking too much with a registry vs having stale data).
We're going to refine this card as part of 1.18, and work on this improvement during 1.19.
I would like to come up with a solution that allows the policy to configure the caching interval, so that the k8s admin can put a value that makes him comfortable
I propose to define a new host capability about caching. The new capability will allow the policy author to cache arbitrary data.
We will then update the verify-image-signatures policy to allow the Kubernetes admin to define the expiration criteria of the cached data.
Steps required to solve this issue:
- [ ] Write RFC about caching host capability. In the RFC, mention we might want to introduce a flexible caching backend. Right now we only have in-process memory cache, we might want to add support for something like Redis to have a cache that is shared among multiple instances of the policy-server
- [ ] Implement caching host capability inside of policy-evaluator, propagate the change to policy server and kwctl
- [ ] Adapt the verify-image-signature to make use of this new capability
I've create a new issue to keep track of the implementation for the cache host capability.