flink-on-k8s-operator
flink-on-k8s-operator copied to clipboard
Helm Chart install uses self-signed cert
The Helm chart creates a self-signed cert which is being rejected by kubectl apply when I try to create a job cluster.
michael@michael:~$ sudo kubectl apply -f cog/authoring/test.yaml [sudo] password for michael: Error from server (InternalError): error when creating "cog/authoring/test.yaml": Internal error occurred: failed calling webhook "mflinkcluster.flinkoperator.k8s.io": Post https://flink-operator-webhook-service.flink-operator-system.svc:443/mutate-flinkoperator-k8s-io-v1beta1-flinkcluster?timeout=30s: x509: certificate signed by unknown authority
Is this a helm chart specific problem? Does it manifest with make deploy?
What does cog/authoring/test.yaml do exactly? Can you attach the exact install commands you've run in CLI?
I am also seeing this issue. It seems possible that the webhook certs are being overwritten after a helm upgrade? I'm using helm3. see below, clientConfig.caBundle had a val then is set to blank Cg==
e.g.:
apiVersion: admissionregistration.k8s.io/v1
kind: MutatingWebhookConfiguration
metadata:
annotations:
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"admissionregistration.k8s.io/v1beta1","kind":"MutatingWebhookConfiguration","metadata":{"annotations":{},"creationTimestamp":null,"name":"flink-operator-mutating-webhook-configuration"},"webhooks":[{"clientConfig":{"caBundle":"LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUM2ekNDQWRNQ0ZCL0ZDekZOWk1Za2hIQmhjeGszZzJmMTB3UnpNQTBHQ1NxR1NJYjNEUUVCQ3dVQU1Db3gKS0RBbUJnTlZCQU1NSDBGa2JXbHpjMmx2YmlCRGIyNTBjbTlzYkdWeUlGZGxZbWh2YjJzZ1EwRXdIaGNOTWpBdwpOekUyTURBek5UUXlXaGNOTWpBd09ERTFNREF6TlRReVdqQTZNVGd3TmdZRFZRUUREQzltYkdsdWF5MXZjR1Z5CllYUnZjaTEzWldKb2IyOXJMWE5sY25acFkyVXVabXhwYm1zdGMzbHpkR1Z0TG5OMll6Q0NBU0l3RFFZSktvWkkKaHZjTkFRRUJCUUFEZ2dFUEFEQ0NBUW9DZ2dFQkFMN1g2Znk5YnZwZHRkYlYrVm5pdVZWSFNCVzhsbmM1VmYxMgpMMUpMcGFzYjhSc0pyak91eXJ4SkkxSGFmMVczczRWM2tqTE84ZnptQ2FPUVJFeWxaRElpaXc2S3dDdTgwSmV2CmdWc2RGd0twY1ZhOW1JWUJJVHZZTGpqdDNSbHBTZ3U3ZStURzgzcUUxYkhlV2lCa1IyQVdvb3ZVbVYvakU0dUMKREFhUHJzOTJKMG1Xbm9QMWErODh4Z2g2eE5zZ2xYRjlZQmk3RzBGL04ybFZnNXJnczNvTXpEdXU0cWlmV0d6bwoxNENpSFowWGwrbnAwdDRuY3pJRk4rck5yN3RMd3B0SDJzL0pTaUYwSlk3eUhMVVZBeFhPSWh4d2RoelFjQ0FiClU1UTRRWVU1Z0IvL1RjamNIRWVtUkUzYUQwTFY3TDlkRkplRVFIZ25ydjR6VDU3VUZLRUNBd0VBQVRBTkJna3EKaGtpRzl3MEJBUXNGQUFPQ0FRRUFhYmhxaG1wcWdOd2pIQmJPa1BwSC8zeGhVVERYaGVzQXd2Yi8wdDhCaVRILwpJWU1xcGpNRWRVR2hKclZ1cGVWNGVGRkxFNG51VUd6SmVxcmY4NFBtdUZaN0EweVkyd3czV2RKZ3gvN0xFcmJYCi90MEhMMWVHMzhyR3FFenFvZk5mWFUvSytnclVkWW8rWGdqWFluZnY5WXBKbHJnYzRIeDZqN0ZjN2thVmJKR0cKcnY3bHF4M0pOYlNIZkI0b2JtYnc1dFpLODRRbEhuN215aFoxdHowbDNsblF4TGVHdUdTdXN0STBxaVFtcU5SQQpiN0tVSGkwMUxxN0l1MkgvdXJ4OXJwamJzYjNPd2xEYUlaS2pOTUYySmxoYS8xT0FCc0x5T1B1NFJtaExiUCt0CnIvUHFWcnR3WTRNYUZpd2Vjc0VaR1BLWHZmaE1pRW1DdVZYVXRUenk0dz09Ci0tLS0tRU5EIENFUlRJRklDQVRFLS0tLS0K","service":{"name":"flink-operator-webhook-service","namespace":"flink-system","path":"/mutate-flinkoperator-k8s-io-v1beta1-flinkcluster"}},"failurePolicy":"Fail","name":"mflinkcluster.flinkoperator.k8s.io","rules":[{"apiGroups":["flinkoperator.k8s.io"],"apiVersions":["v1beta1"],"operations":["CREATE","UPDATE"],"resources":["flinkclusters"]}]}]}
creationTimestamp: "2020-07-16T00:35:30Z"
generation: 3
name: flink-operator-mutating-webhook-configuration
resourceVersion: "162635766"
selfLink: /apis/admissionregistration.k8s.io/v1/mutatingwebhookconfigurations/flink-operator-mutating-webhook-configuration
uid: f1fb4a9c-b6ae-47c1-8cf4-f502d1ddf9b6
webhooks:
- admissionReviewVersions:
- v1beta1
clientConfig:
caBundle: Cg==
service:
name: flink-operator-webhook-service
namespace: flink-system
path: /mutate-flinkoperator-k8s-io-v1beta1-flinkcluster
port: 443
failurePolicy: Fail
matchPolicy: Exact
name: mflinkcluster.flinkoperator.k8s.io
namespaceSelector: {}
objectSelector: {}
reinvocationPolicy: Never
rules:
- apiGroups:
- flinkoperator.k8s.io
apiVersions:
- v1beta1
operations:
- CREATE
- UPDATE
resources:
- flinkclusters
scope: '*'
sideEffects: Unknown
timeoutSeconds: 30
The helm chart creates certs in a task in the chart as self-signed certs. It should use the cluster CA to generate the cert at least.
I am also seeing this issue. It seems possible that the webhook certs are being overwritten after a helm upgrade? I'm using helm3. see below,
clientConfig.caBundlehad a val then is set to blankCg==e.g.:
apiVersion: admissionregistration.k8s.io/v1 kind: MutatingWebhookConfiguration metadata: annotations: kubectl.kubernetes.io/last-applied-configuration: | {"apiVersion":"admissionregistration.k8s.io/v1beta1","kind":"MutatingWebhookConfiguration","metadata":{"annotations":{},"creationTimestamp":null,"name":"flink-operator-mutating-webhook-configuration"},"webhooks":[{"clientConfig":{"caBundle":"LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUM2ekNDQWRNQ0ZCL0ZDekZOWk1Za2hIQmhjeGszZzJmMTB3UnpNQTBHQ1NxR1NJYjNEUUVCQ3dVQU1Db3gKS0RBbUJnTlZCQU1NSDBGa2JXbHpjMmx2YmlCRGIyNTBjbTlzYkdWeUlGZGxZbWh2YjJzZ1EwRXdIaGNOTWpBdwpOekUyTURBek5UUXlXaGNOTWpBd09ERTFNREF6TlRReVdqQTZNVGd3TmdZRFZRUUREQzltYkdsdWF5MXZjR1Z5CllYUnZjaTEzWldKb2IyOXJMWE5sY25acFkyVXVabXhwYm1zdGMzbHpkR1Z0TG5OMll6Q0NBU0l3RFFZSktvWkkKaHZjTkFRRUJCUUFEZ2dFUEFEQ0NBUW9DZ2dFQkFMN1g2Znk5YnZwZHRkYlYrVm5pdVZWSFNCVzhsbmM1VmYxMgpMMUpMcGFzYjhSc0pyak91eXJ4SkkxSGFmMVczczRWM2tqTE84ZnptQ2FPUVJFeWxaRElpaXc2S3dDdTgwSmV2CmdWc2RGd0twY1ZhOW1JWUJJVHZZTGpqdDNSbHBTZ3U3ZStURzgzcUUxYkhlV2lCa1IyQVdvb3ZVbVYvakU0dUMKREFhUHJzOTJKMG1Xbm9QMWErODh4Z2g2eE5zZ2xYRjlZQmk3RzBGL04ybFZnNXJnczNvTXpEdXU0cWlmV0d6bwoxNENpSFowWGwrbnAwdDRuY3pJRk4rck5yN3RMd3B0SDJzL0pTaUYwSlk3eUhMVVZBeFhPSWh4d2RoelFjQ0FiClU1UTRRWVU1Z0IvL1RjamNIRWVtUkUzYUQwTFY3TDlkRkplRVFIZ25ydjR6VDU3VUZLRUNBd0VBQVRBTkJna3EKaGtpRzl3MEJBUXNGQUFPQ0FRRUFhYmhxaG1wcWdOd2pIQmJPa1BwSC8zeGhVVERYaGVzQXd2Yi8wdDhCaVRILwpJWU1xcGpNRWRVR2hKclZ1cGVWNGVGRkxFNG51VUd6SmVxcmY4NFBtdUZaN0EweVkyd3czV2RKZ3gvN0xFcmJYCi90MEhMMWVHMzhyR3FFenFvZk5mWFUvSytnclVkWW8rWGdqWFluZnY5WXBKbHJnYzRIeDZqN0ZjN2thVmJKR0cKcnY3bHF4M0pOYlNIZkI0b2JtYnc1dFpLODRRbEhuN215aFoxdHowbDNsblF4TGVHdUdTdXN0STBxaVFtcU5SQQpiN0tVSGkwMUxxN0l1MkgvdXJ4OXJwamJzYjNPd2xEYUlaS2pOTUYySmxoYS8xT0FCc0x5T1B1NFJtaExiUCt0CnIvUHFWcnR3WTRNYUZpd2Vjc0VaR1BLWHZmaE1pRW1DdVZYVXRUenk0dz09Ci0tLS0tRU5EIENFUlRJRklDQVRFLS0tLS0K","service":{"name":"flink-operator-webhook-service","namespace":"flink-system","path":"/mutate-flinkoperator-k8s-io-v1beta1-flinkcluster"}},"failurePolicy":"Fail","name":"mflinkcluster.flinkoperator.k8s.io","rules":[{"apiGroups":["flinkoperator.k8s.io"],"apiVersions":["v1beta1"],"operations":["CREATE","UPDATE"],"resources":["flinkclusters"]}]}]} creationTimestamp: "2020-07-16T00:35:30Z" generation: 3 name: flink-operator-mutating-webhook-configuration resourceVersion: "162635766" selfLink: /apis/admissionregistration.k8s.io/v1/mutatingwebhookconfigurations/flink-operator-mutating-webhook-configuration uid: f1fb4a9c-b6ae-47c1-8cf4-f502d1ddf9b6 webhooks: - admissionReviewVersions: - v1beta1 clientConfig: caBundle: Cg== service: name: flink-operator-webhook-service namespace: flink-system path: /mutate-flinkoperator-k8s-io-v1beta1-flinkcluster port: 443 failurePolicy: Fail matchPolicy: Exact name: mflinkcluster.flinkoperator.k8s.io namespaceSelector: {} objectSelector: {} reinvocationPolicy: Never rules: - apiGroups: - flinkoperator.k8s.io apiVersions: - v1beta1 operations: - CREATE - UPDATE resources: - flinkclusters scope: '*' sideEffects: Unknown timeoutSeconds: 30
cg== you are seeing is supposed to be replaced by self generate cert, it was a move to decouple flink operator with cert-manager. Job cluster was able to be created successfully at the time last chart release was made. It could be some components in helm chart is outdated since and needs to be synced with most up-to-date operator code base. I'll try to make a new chart see if it will solve the problem.
I'm running into this same issue. Initially the certificate is correct, then if you update anything it becomes unset in both the MutatingWebhookConfiguration and ValidatingWebhookConfiguration.
Removing the operator chart and reinstalling it sets the correct certificate.
I believe the issue is that the Job is running every update. Helm 3 has hooks where you can specify the condition for when a Job should run.
By adding the annotation below to the cert-job, this issue should be resolved for fresh installs.
annotations:
"helm.sh/hook": post-install
If you're running this from the Helm repo then the released version of the chart is unable to create secrets and will fail, the version on master is fine.
I'm still seeing this behavior (version from master) Uninstall then reinstall like @KamalAman said did solve this, but I think its important to figure out why this keeps happening, any one has a suggestion? @functicons , @hongyegong maybe?
The two WebhookConfiguration are first created by the helm template :
- https://github.com/GoogleCloudPlatform/flink-on-k8s-operator/blob/master/helm-chart/flink-operator/templates/flink-operator.yaml#L11
- https://github.com/GoogleCloudPlatform/flink-on-k8s-operator/blob/master/helm-chart/flink-operator/templates/flink-operator.yaml#L386
The cert-job is then started, and it applies a version of the webhookConfig with the caBundle set, using the "envsubst templates" stored in the webhook-configMap (https://github.com/GoogleCloudPlatform/flink-on-k8s-operator/blob/master/helm-chart/flink-operator/templates/generate-cert.yaml#L67).
If you run helm again, the cert-job will not run since its still in the cluster (in state completed), but the helm template versions with \n caBundle will be applied since it differs from whats in the cluster.
If you delete the Job before rerunning helm, the job will be re-created and overwrite the caBundle with a proper value and everything is works again.
A possible solution is to remove the two WebhookConfigs from the flink-operator.yaml and leave up to the the cert job to create them. A pre-delete hook is then needed to remove them on uninstall, as helm is no longer maintaining them directly.
After first deployment of Flink operator I can't apply job manifest too due certificate validation error. I extracted the generated secret with crt and key, and import into trust store -- after that I be able to create job with mflinkcluster.flinkoperator.k8s.io what did i wrong?