flink-on-k8s-operator icon indicating copy to clipboard operation
flink-on-k8s-operator copied to clipboard

Webhook call failing due to bad cert

Open jamesclair opened this issue 4 years ago • 1 comments

After upgrading flink operator, somehow the webhook certificate has become unusable and operator won't accept webhook connections when trying to deploy a flinkcluster CR.

environment

  • operator versions: v1beta1-9, v0.2.1

symptoms kube-apiserver logs:

2021-06-28 07:52:58	
{"log":"W0628 12:52:58.747318       1 dispatcher.go:182] Failed calling webhook, failing closed mflinkcluster.flinkoperator.k8s.io: failed calling webhook \"mflinkcluster.flinkoperator.k8s.io\": Post \"https://flink-operator-webhook-service.flink-operator-system.svc:443/mutate-flinkoperator-k8s-io-v1beta1-flinkcluster?timeout=30s\": x509: certificate signed by unknown authority\n","stream":"stderr","time":"2021-06-28T12:52:58.747487577Z"}

workaround Update the webhook cert by delete/recreate the flink-operator-system/cert-job. Then re-sync/apply the flinkcluster CR. source: How to update the webhook certificate · Issue #356 · GoogleCloudPlatform/flink-on-k8s-operator · GitHub

jamesclair avatar Jun 29 '21 19:06 jamesclair

Got the same after updating a helm chart (which was supposed to only update requests and limits), after update caBundle fields in validation and mutation webhooks were updated to Cg==. Had to restore CA in webhooks manually.

olegy2008 avatar Sep 01 '21 14:09 olegy2008