controller-runtime webhook: Admission metrics are not reported by status code

This is several bugs in one bug since it appears multiple things are broken. Apologies if it gets too complicated.

Problem 1: It appears that controller-runtime webhooks always respond with HTTP status code 200 OK even if the request is completely malformed.

Example (sending a malformed request to a webhook endpoint):

curl -k -v https://localhost:9443/validate-ship-example-org-v1beta1-frigate
< HTTP/2 200
< ... other response headers ...
<
{"response":{"uid":"","allowed":false,"status":{"metadata":{},"message":"contentType=, expected application/json","code":400}}}

Problem 2: controller-runtime webhooks do not use the metav1.Status.Code and always respond with HTTP status code 200 OK.

Example code I used to register my custom handler at /test:

webhookServer.Register("/test", &webhook.Admission{Handler: &myHandler{}})
...

type myHandler struct{}
func (*myHandler) Handle(context.Context, admission.Request) admission.Response {
	return admission.Response{
		AdmissionResponse: admissionv1.AdmissionResponse{
			Allowed: false,
			Result: &metav1.Status{
				Code:    403,
				Message: "things suck",
				Status:  "Failure",
				Reason:  metav1.StatusReasonForbidden}}}
}

And here's the query I am making:

curl -XPOST -k -v -H "Content-Type: application/json" https://localhost:9443/test

< HTTP/2 200
...
<
{"kind":"AdmissionReview","apiVersion":"admission.k8s.io/v1","response":{"uid":"","allowed":false,"status":{"metadata":{},"status":"Failure","message":"things suck","reason":"Forbidden","code":403}}}

Problem 3: controller-runtime doesn't expose any metrics about rejected requests and their .status.codes. This is likely because

we always report code=200 no matter the response (Problem 1 & 2)
it appears we're capping controller_runtime_webhook_requests_total{code=...} dimension to only 200 and 500 here: https://github.com/kubernetes-sigs/controller-runtime/blob/4cae9dfbf174fa5d49ebca007bd6f3784cf1ffb3/pkg/webhook/internal/metrics/metrics.go#L70-L76

so even if I make a faulty request, only the code=200 metric goes up:

controller_runtime_webhook_requests_total{code="200",webhook="/test"} 4

In an ideal state, I'd like to see controller_runtime_webhook_requests_total metric actually let me understand things like how many requests I'm denying (non-200 codes, such as 403, 400, 429, ...).

/kind bug

Oct 09 '24 15:10 ahmetb

I believe the 200 response is if the webhook can create the response. There was an issue around this with failure policy and not sure if this helps with your issue https://github.com/kubernetes-sigs/controller-runtime/issues/2662.

Oct 10 '24 18:10 troy0820

Pretty much what troy said. I do agree that this is confusing and that it would be helpful to have metrics around the status code in the admission response.

Oct 12 '24 19:10 alvaroaleman

Indeed, after creating the issue I've realized HTTP response code must be 200 regardless of the admission response (which contains the status code the apiserver would return in its response body).

Maybe we could still add a metric that parses the code out from admission.Response? (I understand we can't help if someone actually decides to implement their own http.Handler).

Oct 15 '24 18:10 ahmetb

/remove-kind bug /kind feature

Oct 16 '24 05:10 ahmetb

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Jan 14 '25 05:01 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle rotten
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

Feb 13 '25 06:02 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen
Mark this issue as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Mar 15 '25 06:03 k8s-triage-robot

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied

After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied

After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen

Mark this issue as fresh with /remove-lifecycle rotten

Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Mar 15 '25 06:03 k8s-ci-robot

/reopen /remove-lifecycle rotten

Adding a metric seems fine

Mar 22 '25 05:03 sbueringer

@sbueringer: Reopened this issue.

In response to this:

/reopen /remove-lifecycle rotten

Adding a metric seems fine

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Mar 22 '25 05:03 k8s-ci-robot

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Jun 24 '25 20:06 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle rotten
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

Jul 24 '25 20:07 k8s-triage-robot

/remove-lifecycle rotten /lifecycle frozen

Jul 25 '25 05:07 sbueringer