controller-runtime icon indicating copy to clipboard operation
controller-runtime copied to clipboard

webhook: Admission metrics are not reported by status code

Open ahmetb opened this issue 1 year ago • 13 comments

This is several bugs in one bug since it appears multiple things are broken. Apologies if it gets too complicated.

Problem 1: It appears that controller-runtime webhooks always respond with HTTP status code 200 OK even if the request is completely malformed.

Example (sending a malformed request to a webhook endpoint):

curl -k -v https://localhost:9443/validate-ship-example-org-v1beta1-frigate
< HTTP/2 200
< ... other response headers ...
<
{"response":{"uid":"","allowed":false,"status":{"metadata":{},"message":"contentType=, expected application/json","code":400}}}

Problem 2: controller-runtime webhooks do not use the metav1.Status.Code and always respond with HTTP status code 200 OK.

Example code I used to register my custom handler at /test:

webhookServer.Register("/test", &webhook.Admission{Handler: &myHandler{}})
...

type myHandler struct{}
func (*myHandler) Handle(context.Context, admission.Request) admission.Response {
	return admission.Response{
		AdmissionResponse: admissionv1.AdmissionResponse{
			Allowed: false,
			Result: &metav1.Status{
				Code:    403,
				Message: "things suck",
				Status:  "Failure",
				Reason:  metav1.StatusReasonForbidden}}}
}

And here's the query I am making:

curl -XPOST -k -v -H "Content-Type: application/json" https://localhost:9443/test

< HTTP/2 200
...
<
{"kind":"AdmissionReview","apiVersion":"admission.k8s.io/v1","response":{"uid":"","allowed":false,"status":{"metadata":{},"status":"Failure","message":"things suck","reason":"Forbidden","code":403}}}

Problem 3: controller-runtime doesn't expose any metrics about rejected requests and their .status.codes. This is likely because

  1. we always report code=200 no matter the response (Problem 1 & 2)
  2. it appears we're capping controller_runtime_webhook_requests_total{code=...} dimension to only 200 and 500 here: https://github.com/kubernetes-sigs/controller-runtime/blob/4cae9dfbf174fa5d49ebca007bd6f3784cf1ffb3/pkg/webhook/internal/metrics/metrics.go#L70-L76

so even if I make a faulty request, only the code=200 metric goes up:

controller_runtime_webhook_requests_total{code="200",webhook="/test"} 4

In an ideal state, I'd like to see controller_runtime_webhook_requests_total metric actually let me understand things like how many requests I'm denying (non-200 codes, such as 403, 400, 429, ...).

/kind bug

ahmetb avatar Oct 09 '24 15:10 ahmetb

I believe the 200 response is if the webhook can create the response. There was an issue around this with failure policy and not sure if this helps with your issue https://github.com/kubernetes-sigs/controller-runtime/issues/2662.

troy0820 avatar Oct 10 '24 18:10 troy0820

Pretty much what troy said. I do agree that this is confusing and that it would be helpful to have metrics around the status code in the admission response.

alvaroaleman avatar Oct 12 '24 19:10 alvaroaleman

Indeed, after creating the issue I've realized HTTP response code must be 200 regardless of the admission response (which contains the status code the apiserver would return in its response body).

Maybe we could still add a metric that parses the code out from admission.Response? (I understand we can't help if someone actually decides to implement their own http.Handler).

ahmetb avatar Oct 15 '24 18:10 ahmetb

/remove-kind bug /kind feature

ahmetb avatar Oct 16 '24 05:10 ahmetb

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Jan 14 '25 05:01 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle rotten
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot avatar Feb 13 '25 06:02 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-triage-robot avatar Mar 15 '25 06:03 k8s-triage-robot

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

k8s-ci-robot avatar Mar 15 '25 06:03 k8s-ci-robot

/reopen /remove-lifecycle rotten

Adding a metric seems fine

sbueringer avatar Mar 22 '25 05:03 sbueringer

@sbueringer: Reopened this issue.

In response to this:

/reopen /remove-lifecycle rotten

Adding a metric seems fine

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

k8s-ci-robot avatar Mar 22 '25 05:03 k8s-ci-robot

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Jun 24 '25 20:06 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle rotten
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot avatar Jul 24 '25 20:07 k8s-triage-robot

/remove-lifecycle rotten /lifecycle frozen

sbueringer avatar Jul 25 '25 05:07 sbueringer