eventing icon indicating copy to clipboard operation
eventing copied to clipboard

Eventing webhook fails to start / stuck in crash loop

Open maylukas opened this issue 9 months ago • 3 comments

Describe the bug Clean installation using the operator fails. The eventing-webhook is in a crash loop.

Logs of the eventing webhook:

2024/05/02 10:43:54 Registering 5 informer factories
2024/05/02 10:43:54 Registering 7 informers
2024/05/02 10:43:54 Registering 7 controllers
{"level":"info","ts":"2024-05-02T10:43:55.083Z","logger":"eventing-webhook","caller":"profiling/server.go:65","msg":"Profiling enabled: false","commit":"e23ebab","knative.dev/pod":"eventing-webhook-5874bb8445-rz6t7"}
{"level":"info","ts":"2024-05-02T10:43:55.112Z","logger":"eventing-webhook","caller":"leaderelection/context.go:47","msg":"Running with Standard leader election","commit":"e23ebab","knative.dev/pod":"eventing-webhook-5874bb8445-rz6t7"}
{"level":"info","ts":"2024-05-02T10:43:55.139Z","logger":"eventing-webhook","caller":"sinkbinding/controller.go:194","msg":"Starting global resync of SinkBindings every 30m0s","commit":"e23ebab","knative.dev/pod":"eventing-webhook-5874bb8445-rz6t7"}
{"level":"info","ts":"2024-05-02T10:43:55.176Z","logger":"eventing-webhook","caller":"sharedmain/main.go:283","msg":"Starting configuration manager...","commit":"e23ebab","knative.dev/pod":"eventing-webhook-5874bb8445-rz6t7"}
{"level":"info","ts":"2024-05-02T10:43:55.264Z","logger":"eventing-webhook","caller":"sinkbinding/controller.go:89","msg":"feature config changed. name: config-features, value: map[authentication-oidc:Disabled cross-namespace-event-links:Disabled delivery-retryafter:Disabled delivery-timeout:Enabled eventtype-auto-create:Disabled kreference-group:Disabled kreference-mapping:Disabled new-trigger-filters:Enabled transport-encryption:Disabled]","commit":"e23ebab","knative.dev/pod":"eventing-webhook-5874bb8445-rz6t7"}
{"level":"info","ts":1714646635.2782583,"logger":"fallback","caller":"injection/injection.go:63","msg":"Starting informers..."}
{"level":"warn","ts":"2024-05-02T10:43:55.778Z","logger":"eventing-webhook","caller":"webhook/webhook.go:197","msg":"server key missing","commit":"e23ebab","knative.dev/pod":"eventing-webhook-5874bb8445-rz6t7"}
{"level":"error","ts":"2024-05-02T10:43:55.778Z","logger":"eventing-webhook","caller":"webhook/webhook.go:248","msg":"http: TLS handshake error from 51.75.198.249:39398: tls: no certificates configured\n","commit":"e23ebab","knative.dev/pod":"eventing-webhook-5874bb8445-rz6t7","stacktrace":"knative.dev/pkg/webhook.(*zapWrapper).Write\n\tknative.dev/[email protected]/webhook/webhook.go:248\nlog.(*Logger).output\n\tlog/log.go:245\nlog.(*Logger).Printf\n\tlog/log.go:268\nnet/http.(*Server).logf\n\tnet/http/server.go:3411\nnet/http.(*conn).serve\n\tnet/http/server.go:1930"}
{"level":"warn","ts":"2024-05-02T10:43:55.863Z","logger":"eventing-webhook","caller":"webhook/webhook.go:197","msg":"server key missing","commit":"e23ebab","knative.dev/pod":"eventing-webhook-5874bb8445-rz6t7"}
{"level":"error","ts":"2024-05-02T10:43:55.863Z","logger":"eventing-webhook","caller":"webhook/webhook.go:248","msg":"http: TLS handshake error from 51.75.198.249:39410: tls: no certificates configured\n","commit":"e23ebab","knative.dev/pod":"eventing-webhook-5874bb8445-rz6t7","stacktrace":"knative.dev/pkg/webhook.(*zapWrapper).Write\n\tknative.dev/[email protected]/webhook/webhook.go:248\nlog.(*Logger).output\n\tlog/log.go:245\nlog.(*Logger).Printf\n\tlog/log.go:268\nnet/http.(*Server).logf\n\tnet/http/server.go:3411\nnet/http.(*conn).serve\n\tnet/http/server.go:1930"}
{"level":"warn","ts":"2024-05-02T10:43:56.863Z","logger":"eventing-webhook","caller":"webhook/webhook.go:197","msg":"server key missing","commit":"e23ebab","knative.dev/pod":"eventing-webhook-5874bb8445-rz6t7"}
{"level":"error","ts":"2024-05-02T10:43:56.863Z","logger":"eventing-webhook","caller":"webhook/webhook.go:248","msg":"http: TLS handshake error from 51.75.198.249:47560: tls: no certificates configured\n","commit":"e23ebab","knative.dev/pod":"eventing-webhook-5874bb8445-rz6t7","stacktrace":"knative.dev/pkg/webhook.(*zapWrapper).Write\n\tknative.dev/[email protected]/webhook/webhook.go:248\nlog.(*Logger).output\n\tlog/log.go:245\nlog.(*Logger).Printf\n\tlog/log.go:268\nnet/http.(*Server).logf\n\tnet/http/server.go:3411\nnet/http.(*conn).serve\n\tnet/http/server.go:1930"}
{"level":"warn","ts":"2024-05-02T10:43:57.840Z","logger":"eventing-webhook","caller":"webhook/webhook.go:197","msg":"server key missing","commit":"e23ebab","knative.dev/pod":"eventing-webhook-5874bb8445-rz6t7"}

Knative Eventing Resource

apiVersion: operator.knative.dev/v1beta1
kind: KnativeEventing
metadata:
  name: knative-eventing
  namespace: knative-eventing
spec:
  source:
    rabbitmq:
      enabled: true
  version: 1.14.0

Expected behavior Installation of Knative Eventing should be successful

To Reproduce Installation of cert-manager (1.14.5) Installation of trust-manager (0.7.1) Installation of istio (1.21.2) Installation of Knative operator (1.14.0) Installation of Knative Serving (1.14.0) Installation of Knative Eventing (1.14.0)

Knative release version 1.14.0

Additional context Add any other context about the problem here such as proposed priority

maylukas avatar May 02 '24 10:05 maylukas

We're also seeing issues with the "routing-serving-certs" issuance: Failed to wait for order resource "routing-serving-certs-1-422265175" to become ready: order is in "errored" state: Failed to create Order: 400 urn:ietf:params:acme:error:rejectedIdentifier: Error creating new order :: Cannot issue for "kn-routing": Domain name needs at least one dot

maylukas avatar May 02 '24 12:05 maylukas

cc @pierDipi

Cali0707 avatar May 13 '24 18:05 Cali0707

I think these comments are relevant here https://github.com/knative/pkg/issues/2560#issuecomment-1195840564 and https://github.com/knative/pkg/issues/2560#issuecomment-1195842825, in particular these parts

I'm curious what cert is the webhook presenting and see what's defined in your CA bundle of the configured webhook (ie. ValidatingWebhookConfiguration and MutatingWebhookConfiguration)

and

The typical misconfiguration we see is if the liveness probe timeout of the webhook deployment is too low - it never gets a chance to become the leader and create the certificate. This is because K8s kills the container.

pierDipi avatar May 14 '24 14:05 pierDipi

We could solve this issue by increasing the memory limits & requests

maylukas avatar Aug 02 '24 08:08 maylukas