flink-on-k8s-operator
flink-on-k8s-operator copied to clipboard
Webhook call failing due to bad cert
After upgrading flink operator, somehow the webhook certificate has become unusable and operator won't accept webhook connections when trying to deploy a flinkcluster CR.
environment
- operator versions: v1beta1-9, v0.2.1
symptoms kube-apiserver logs:
2021-06-28 07:52:58
{"log":"W0628 12:52:58.747318 1 dispatcher.go:182] Failed calling webhook, failing closed mflinkcluster.flinkoperator.k8s.io: failed calling webhook \"mflinkcluster.flinkoperator.k8s.io\": Post \"https://flink-operator-webhook-service.flink-operator-system.svc:443/mutate-flinkoperator-k8s-io-v1beta1-flinkcluster?timeout=30s\": x509: certificate signed by unknown authority\n","stream":"stderr","time":"2021-06-28T12:52:58.747487577Z"}
workaround
Update the webhook cert by delete/recreate the flink-operator-system/cert-job.
Then re-sync/apply the flinkcluster CR.
source: How to update the webhook certificate · Issue #356 · GoogleCloudPlatform/flink-on-k8s-operator · GitHub
Got the same after updating a helm chart (which was supposed to only update requests and limits), after update caBundle fields in validation and mutation webhooks were updated to Cg==. Had to restore CA in webhooks manually.