kyverno Debug mode which shows entire AdmissionReview payload

Is your feature request related to a problem? Please describe.

Writing policies that aren't documented (or with samples) or are complex or unusual can be difficult (no schema validation makes this extraordinarily so). Sometimes I'm guessing at the statements to make in a policy because I don't know what the incoming payload from AdmissionReview looks like, so although the statement might be valid, it doesn't align with the incoming data. Scurrying off to GitHub to create an issue or to Slack to post a question every time a user is having difficulty creating a policy that accomplishes their goals doesn't scale well and takes time away from maintainers working to improve the code. We can do more to remove much of the guesswork to write policies/rules that accomplish the desired behavior.

Describe the solution you'd like

Create a debug switch/advanced mode which shows in the logs the entire AdmissionReview payload that got sent to Kyverno. This accomplishes two major things:

Gives users a tool to help themselves as they can see the entire data contents of some action they performed and get a better sense on how to write a rule that matches it.
Allows more accurate reporting to maintainers of true bugs.

Describe alternatives you've considered

Enable auditing for kube-apiserver (not possible on some K8s distros) and dive through a sea of logs.
Continue to bug Jim, Shuting, et al every time a custom-written policy or rule doesn't work and you don't know why.

Additional context Having this debug ability would have pointed me in the right direction in this case which, although it still wouldn't have worked given the current code, would have at least told me that I needed to match on request.oldObject rather than request.object.

Nov 29 '20 14:11 chipzoller

Another practical use case I face is when writing a policy for https://github.com/kyverno/kyverno/issues/1069. I don't know what type of operation this is so I'm now forced to guess or dive through api-server logs. Is it an UPDATE? What's the AdmissionReview structure of the request? Don't know.

Dec 13 '20 14:12 chipzoller

Some ideas for implementing this function:

Available as a flag which can be passed into the Kyverno container as a param which can take a few args:

Length of time from this point when AdmissionReview payloads are dumped (ex. 60 seconds), after which it returns to normal logging behavior.
Operation types it should dump (ex., only CREATE)
Kubernetes kinds it should dump (ex., only Services)

Not available as a flag but can be injected by Kyverno CLI (same params as above).

All params could be available together with a syntax of:

kyverno --dump-payload=true --dump-payload-timer=60s --dump-payload-ops=CREATE --dump-payload-kinds=Service

Jan 21 '21 17:01 chipzoller

/assign

May 28 '21 01:05 viveksahu26

I like the idea of using flags for this....can we also enable / disable via the ConfigMap? If so, do we really need to support a time (--dump-payload-timer) - as that requires managing some additional state?

May 28 '21 01:05 JimBugwadia

The timer was an idea to prevent Kyverno from destroying itself due to high volume of admission review requests. Could try without it and see what happens.

May 28 '21 11:05 chipzoller

Seems like all fields in the webhook rules can be used:

  rules:
  - apiGroups:   [""]
    apiVersions: ["v1"]
    operations:  ["CREATE"]
    resources:   ["pods"]
    scope:       "Namespaced"

https://kubernetes.io/docs/reference/access-authn-authz/extensible-admission-controllers/#matching-requests-rules

Jul 05 '21 20:07 JimBugwadia

Hello, I'd like to pick up this issue!

Nov 09 '21 17:11 zeborg

@zeborg - we will need to discuss design options for this.

The two options I see are:

Print details in a log
Expose a GRPC port, or similar, to send audit trail details to

The AdmissionRequest may also contain raw data including secrets and other sensitive information. Hence, we need the solution to be secure.

Nov 09 '21 20:11 JimBugwadia

This might be something that can be printed at verbosity level of like 6 or something. The same concerns exist here as those which already exist where variables are printed when resolved.

Nov 09 '21 21:11 chipzoller

Agreed. If there are log statements that can accidently expose secrets as variable values, lets remove them or make sure the data is masked.

Nov 10 '21 02:11 JimBugwadia

Hey @chipzoller @JimBugwadia, here are some updates on the issue (sorry for the long delay!):

Logging the AdmissionReview or AdmissionRequest payload actually exposes the variable values in base64 format in case of a resource with kind: Secret.
The logged payload is printed in a single-line unformatted state containing lots of escape characters, which makes it quite hard to make sense of in case of huge logs.
The verbosity seems to be static, and no matter what the -v flag is set to, the verbosity level won't go above 2, which really seems like a bug to me unless that's how it is meant to be.
I found multiple logging packages being used simultaneously (log, klog, klogr, logr) throughout the project, although by observing the source of each it seems like they're all correlated and hence used in conjunction. I'd still appreciate any help in understanding what purpose each serves when combined together so that I can come up with an approach track down the issue with verbosity.

Here's an example of the AdmissionRequest log for a resource with kind: Secret:

I1116 07:38:47.044529       1 server.go:490] WebhookServer/ValidateWebhook "msg"="admissionrequest payload: {\"uid\":\"5085bc8a-a4ea-4769-a515-7f58829b3a64\",\"kind\":{\"group\":\"\",\"version\":\"v1\",\"kind\":\"Secret\"},\"resource\":{\"group\":\"\",\"version\":\"v1\",\"resource\":\"secrets\"},\"requestKind\":{\"group\":\"\",\"version\":\"v1\",\"kind\":\"Secret\"},\"requestResource\":{\"group\":\"\",\"version\":\"v1\",\"resource\":\"secrets\"},\"name\":\"mysecret\",\"namespace\":\"default\",\"operation\":\"CREATE\",\"userInfo\":{\"username\":\"kubernetes-admin\",\"groups\":[\"system:masters\",\"system:authenticated\"]},\"object\":{\"kind\":\"Secret\",\"apiVersion\":\"v1\",\"metadata\":{\"name\":\"mysecret\",\"namespace\":\"default\",\"uid\":\"4c543faf-1012-480d-a8ae-d1bd205f11dd\",\"creationTimestamp\":\"2021-11-16T07:38:47Z\",\"labels\":{\"purpose\":\"development\"},\"annotations\":{\"kubectl.kubernetes.io/last-applied-configuration\":\"{\\\"apiVersion\\\":\\\"v1\\\",\\\"data\\\":{\\\"password\\\":\\\"MWYyZDFlMmU2N2Rm\\\",\\\"username\\\":\\\"YWRtaW4=\\\"},\\\"kind\\\":\\\"Secret\\\",\\\"metadata\\\":{\\\"annotations\\\":{},\\\"labels\\\":{\\\"purpose\\\":\\\"development\\\"},\\\"name\\\":\\\"mysecret\\\",\\\"namespace\\\":\\\"default\\\"},\\\"type\\\":\\\"Opaque\\\"}\\n\"},\"managedFields\":[{\"manager\":\"kubectl-client-side-apply\",\"operation\":\"Update\",\"apiVersion\":\"v1\",\"time\":\"2021-11-16T07:38:47Z\",\"fieldsType\":\"FieldsV1\",\"fieldsV1\":{\"f:data\":{\".\":{},\"f:password\":{},\"f:username\":{}},\"f:metadata\":{\"f:annotations\":{\".\":{},\"f:kubectl.kubernetes.io/last-applied-configuration\":{}},\"f:labels\":{\".\":{},\"f:purpose\":{}}},\"f:type\":{}}}]},\"data\":{\"password\":\"MWYyZDFlMmU2N2Rm\",\"username\":\"YWRtaW4=\"},\"type\":\"Opaque\"},\"oldObject\":null,\"dryRun\":false,\"options\":{\"kind\":\"CreateOptions\",\"apiVersion\":\"meta.k8s.io/v1\",\"fieldManager\":\"kubectl-client-side-apply\"}}" "kind"="Secret" "name"="mysecret" "namespace"="default" "operation"="CREATE" "uid"="5085bc8a-a4ea-4769-a515-7f58829b3a64"

ClusterPolicy used above for testing:

apiVersion: kyverno.io/v1
# The `ClusterPolicy` kind applies to the entire cluster.
kind: ClusterPolicy
metadata:
  name: require-ns-purpose-label
# The `spec` defines properties of the policy.
spec:
  # The `validationFailureAction` tells Kyverno if the resource being validated should be allowed but reported (`audit`) or blocked (`enforce`).
  validationFailureAction: enforce
  # The `rules` is one or more rules which must be true.
  rules:
  - name: require-ns-purpose-label
    # The `match` statement sets the scope of what will be checked. In this case, it is any `Namespace` resource.
    match:
      resources:
        kinds:
        - Secret
    # The `validate` statement tries to positively check what is defined. If the statement, when compared with the requested resource, is true, it is allowed. If false, it is blocked.
    validate:
      # The `message` is what gets displayed to a user if this rule fails validation and is therefore blocked.
      message: "You must have label `purpose` with value `production` set on all new namespaces."
      # The `pattern` object defines what pattern will be checked in the resource. In this case, it is looking for `metadata.labels` with `purpose=production`.
      pattern:
        metadata:
          labels:
            purpose: production

Resource used above for testing:

apiVersion: v1
kind: Secret
metadata:
  name: mysecret
  labels:
    purpose: development
type: Opaque
data:
  username: YWRtaW4=
  password: MWYyZDFlMmU2N2Rm

Nov 20 '21 16:11 zeborg

In order to make the UX nice, I wonder if we can redact values in secrets like this and where they're consumed. It's probably require more work as we couldn't just look for kind: Secret. The obvious challenge here is balancing securing this output with being able to switch it on and consume it simply.

Nov 20 '21 17:11 chipzoller

I propose we decouple this functionality from logging, as there are other use cases to consider for observability.

@zeborg - regarding the question on the different loggers, its a bit of a mess. We had started with glog and then moved to zap but need to stay compatible with Kubernetes loggers. We also do not have a great way to internally manage module level loggers, and method signatures are polluted with loggers being passed around. Can you start a separate issue to track and discuss this area?

Nov 20 '21 21:11 JimBugwadia

Not strictly related, but adding this project: https://github.com/anderseknert/kube-review

Jan 19 '22 23:01 chipzoller

Another troubleshooting use case: https://github.com/kyverno/kyverno/issues/3261

Feb 24 '22 17:02 chipzoller

Chiming in to say this would be a great feature, I'm finding it quite hard to debug exactly why a policy I'm working on isn't working.

Apr 30 '22 04:04 mogopz

I think that @chipzoller has very succinctly identified the problem in this one line

Sometimes I'm guessing at the statements to make in a policy because I don't know what the incoming payload from AdmissionReview looks like, so although the statement might be valid, it doesn't align with the incoming data

Users want to be able to troubleshoot and debug issues themselves and having to resort to Slack or Github issues every time is just not feasible. Furthermore having debug capabilities helps one understand the tool (Kyverno) even better.

Aug 04 '22 15:08 ghost

Tacking on a use-case here, we have hit a problem where an existing in-house mutating webhook and Kyverno are causing unexpected (and baffling) interactions that result in pods being unable to run. Kyverno runs after this other webhook, so its hard for us to determine what the state of the admission request is when it arrives at Kyverno to figure out the source of the problem. Slack discussion here - https://kubernetes.slack.com/archives/CLGR9BJU9/p1661955120210759

Aug 31 '22 16:08 zswanson

cc: @yimikao @eddycharly

Sep 17 '22 21:09 JimBugwadia

@yimikao I wanted to assign it to you (for some reason it doesn't work 🤔)

Oct 14 '22 19:10 eddycharly

@yimikao must respond in this issue to be eligible for assignment.

Oct 14 '22 19:10 chipzoller

/assign

Oct 14 '22 19:10 ghost

@eddycharly

Oct 14 '22 19:10 ghost

kyverno kyverno copied to clipboard

Debug mode which shows entire AdmissionReview payload

kyverno
kyverno copied to clipboard