policy-server icon indicating copy to clipboard operation
policy-server copied to clipboard

policy-server seems to pull cached OCI artifacts regardless of hot cache

Open viccuad opened this issue 1 year ago • 6 comments

Deploying Kubewarden with Audit Scanner enabled, and configured to run every 2 minutes, Deploying verify-image-signatures policy configured to verify Application Collection images following the instructions in https://github.com/kubewarden/docs/pull/443,

It seems that the PolicyServer still exercises the OCI registry instead of consuming from its cache, when calling: https://github.com/kubewarden/policy-evaluator/blob/3cd66b932b199037e677e3e204d4d9742e23edc8/src/callback_handler/sigstore_verification.rs#L251-L266

Acceptance criteria

Verify that policy-server cache for context-aware calls is correctly configured. Configure the cache in policy-evaluator as needed. Add tests as needed.

viccuad avatar Aug 21 '24 13:08 viccuad

I've tested the code. Everything is working as described:

  • the results obtained from the registry are cached for 1 minute
  • only successful results are cached

If a container image is not signed, getting its signature will fail. Hence whenever a workload uses an unsigned image we will keep reaching to the remote registry until a signature blob is found.

In the setup described above, the audit-scanner performs an assessment every 2 minutes. That means the cache is always empty when the scanner is initiated. However, if multiple workloads are using the same image, the remote registry is interrogated only once. However, don't forget the cache is specific to the policy-server instance. When running multiple policy server instances, each one of them will reach out to the registry for the same image; but each one will do that only once.

We could provide a configuration knob that sets the cache expiration time.

@recena: do you have any opinion? I know the potential bug was reported by you.

flavio avatar Sep 24 '24 14:09 flavio

Moving to blocked, waiting for feedback

flavio avatar Sep 24 '24 14:09 flavio

I'm not sure If I understand the scenario, but:

  1. We should cache valid images → signed
  2. 1 minute for TTL is too short

recena avatar Sep 24 '24 15:09 recena

We're caching the valid images, but we expire the cache after 1 minute. That's because someone in the meantime might overwrite a tag.

For example:

  • Time 10:00:00: we verify registry.local/nginx:1_alpine, we fetch data about it from the registry. This information is cached for X minutes
  • Time 10:02:00: someone overwrites registry.local/nginx:1_alpine
  • Time 10:05:00: we have to verify registry.local/nginx:1_alpine again. If the cache is expired everything will be fine, otherwise we will reuse the details about the original image that was around at 10:00:00; which might be bad

Right now we're conservative, being a security project, and we let the cache expire after 1 minute.

I think we should allow the user to configure the cache expiration time. In this way the user could define a value that is the right tradeoff between the two cases (talking too much with a registry vs having stale data).

flavio avatar Oct 03 '24 13:10 flavio

We're going to refine this card as part of 1.18, and work on this improvement during 1.19.

I would like to come up with a solution that allows the policy to configure the caching interval, so that the k8s admin can put a value that makes him comfortable

flavio avatar Oct 04 '24 13:10 flavio

I propose to define a new host capability about caching. The new capability will allow the policy author to cache arbitrary data.

We will then update the verify-image-signatures policy to allow the Kubernetes admin to define the expiration criteria of the cached data.

Steps required to solve this issue:

  • [ ] Write RFC about caching host capability. In the RFC, mention we might want to introduce a flexible caching backend. Right now we only have in-process memory cache, we might want to add support for something like Redis to have a cache that is shared among multiple instances of the policy-server
  • [ ] Implement caching host capability inside of policy-evaluator, propagate the change to policy server and kwctl
  • [ ] Adapt the verify-image-signature to make use of this new capability

flavio avatar Oct 24 '24 15:10 flavio

I've create a new issue to keep track of the implementation for the cache host capability.

jvanz avatar Oct 02 '25 21:10 jvanz