flink-on-k8s-operator
flink-on-k8s-operator copied to clipboard
Getting context deadline exceeded error on eks cluster
I am trying to deploy flink on aws eks cluster. The cluster does not have any specific master node as it is amazon managed cluster. I have successfully deployed the flink operator chart using helm. Next I tried deploying flink session cluster chart. For this I am getting below error:
helm install flink-session flink/flink-session-cluster/ -f version.yaml --debug
WARNING: Kubernetes configuration file is group-readable. This is insecure. Location: /home/centos/.kube/config
WARNING: Kubernetes configuration file is world-readable. This is insecure. Location: /home/centos/.kube/config
install.go:172: [debug] Original chart version: ""
install.go:189: [debug] CHART PATH: /opt/eva/helm/charts/flink/flink-session-cluster
client.go:122: [debug] creating 5 resource(s)
Error: Internal error occurred: failed calling webhook "mflinkcluster.flinkoperator.k8s.io": Post https://flink-operator-webhook-service.default.svc:443/mutate-flinkoperator-k8s-io-v1beta1-flinkcluster?timeout=30s: context deadline exceeded
helm.go:81: [debug] Internal error occurred: failed calling webhook "mflinkcluster.flinkoperator.k8s.io": Post https://flink-operator-webhook-service.default.svc:443/mutate-flinkoperator-k8s-io-v1beta1-flinkcluster?timeout=30s: context deadline exceeded
One observation I had is that there is no communication (ping not working) between the node (aws launch pad machine) that I am using to deploy session cluster chart and the webhook service "flink-operator-webhook-service.default.svc" .
I have successfully deployed flink operator and the flink cluster on the non aws k8s cluster. There I can see that such communication is present.
Can Someone help me in knowing what is the issue?
Hello @vinaykw did you resolve this issue ? (more than one year later ^^)
Hello,
I got similar issue with an other operator... May be it could help:
Take a look to security groups (with terraform-aws-eks, they will be named xxxx-cluster and xxxx-node). By default ports 443 and 10250 are allow; if your pod doesn't listen on one allowed port, it will be blocked (even if k8s api server call service url with port 443).
To check used port, use command kubectl get endpoints -n <operator namespace>. You can read more about endpoint here.
Some operator like prometheus or cert-manager don't have this issue because their validation webhook is listening on port 10250.