Unallowed admission reviews to not reply to the requestor with http status code 200
/kind bug
Expected Behavior
The admission controller to reply to k8s API server on failed admission requests with http code != 2XX.
Actual Behavior
Even when the request is not allowed, the admission handler just return the admission review to the api server without explicitly setting HTTP status code, therefore the default status code of 200 is used. For k8s, this is considered as successful admission review and therefore the failure policy Ignore is not considered, hence the request fails.
The code writing the response is here https://github.com/knative/pkg/blob/12be06090b5122f58fa8b3a9870e26fb6de0bb20/webhook/admission.go#L136.
Steps to Reproduce the Problem
- Get access to some k8s cluster, or just create a fresh one with kind,
$ kind create cluster
- Deploy customized version of cosigned
$ kubectl create -f https://gist.githubusercontent.com/vpnachev/acd2b9ba3f457d6d05e44fd9c3a69fca/raw/111be95a393ea6cc2c63d593459553aec09219f7/cosigned-v1.5.2-with-ignore-failure-policy.yaml
- this is just rendered template of the chart with changed failure policy to
Ignoreand addedNamespaceresource. - also a dummy public key is added
- Create a namespace and label it with
cosigned.sigstore.dev/include=true
$ kubectl create namespace test-ignore-policy
$ kubectl label namespace test-ignore-policy cosigned.sigstore.dev/include="true"
-
Wait the
cosigned-webhookin thecosignednamespace to becomeRunning. -
Try to run pod in the
test-ignore-policynamespace and make sure thecreate/podrequest fails
$ kubectl -n test-ignore-policy run test --image=alpine:3.15.0
Error from server (BadRequest): admission webhook "cosigned.sigstore.dev" denied the request: validation failed: one of verifier or root certs is required: spec.containers[0].image
index.docker.io/library/alpine@sha256:21a3deaa0d32a8057914f36584b5288d2e5ecc984380bc0118285c70fa8c9300
- Make sure there is no pod
testin thetest-ignore-policynamespace
$ kubectl -n test-ignore-policy get pod test
Error from server (NotFound): pods "test" not found
Additional Info
- https://kubernetes.slack.com/archives/C0EG7JC6T/p1650375336872579
- https://github.com/kubernetes/website/issues/33039
Does this: https://github.com/knative/pkg/blob/12be06090b5122f58fa8b3a9870e26fb6de0bb20/webhook/admission.go#L137 not set the status code to 500?
No, it does not. json.NewEncoder(w).Encode(response) returns nil and nothing else is explicitly written to http.ResponseWriter. When the HTTP code is not explicitly set, the default 200 is used, ref
// If WriteHeader has not yet been called, Write calls // WriteHeader(http.StatusOK) before writing the data.
Very interesting. Talking with @mattmoor about this now.
Judging from this portion of the slack conversation:

I think the only potential issue here is that the json encoder may partially write to the response before erroring out, so the http.Error will be ineffective because as we start to write bytes the response writer will automatically use 200.
However, since this should result in a partial object, Kubernetes should reject the response because it is an invalid AdmissionReview, so this is mostly defensive (e.g. if nothing is written). So short of buffering the response so we can definitively return a 500, I think what we're doing is correct.
The admission controller to reply to k8s API server on failed admission requests with http code != 2XX.
This statement from the expected behavior above seems inconsistent with Jordan's third comment in my screenshot. failurePolicy does NOT control the behavior when the admission controller returns a valid AdmissionReview rejecting the request.
So unless I'm missing something, which is entirely possible, this seems like it is WAI to me.
It depends how the failure policy is used and what is the expectation.
During the last 4 years I was always expecting that failure policy Ignore will allow even requests rejected by admission controller to be completed. And we have couple of tens of admission controllers that were used like this. In the initial phase we deploy them with Ignore and just later, when we have some experience and the bugs are fixed, we switch the failure policy to Fail.
Now, I want to deploy cosigned (component using knative.dev/pkg as SDK for admission controller checking if OCI images are signed), but I know cosigned will reject a lot of the requests initially, so I wanted to deploy it with failure policy Ignore, and observe in the monitoring which images are failing the check. After some time, all our images will be successfully admitted, and then I can change the failure policy to Fail. I just want to use some mechanism to use silent failures without causing disruptions and enforce the validation only after all prerequisites are met.
I may be misunderstanding @liggitt but my impression is that what you describe is not what failurePolicy was intended to handle.
Yes, I find yesterday that I was having wrong understanding how failure policies work. We (maybe also others) were just using it as loophole to configure admission controllers to fail silently.
This issue is stale because it has been open for 90 days with no
activity. It will automatically close after 30 more days of
inactivity. Reopen the issue with /reopen. Mark the issue as
fresh by adding the comment /remove-lifecycle stale.