flink-on-k8s-operator
flink-on-k8s-operator copied to clipboard
Failed calling webhook: context deadline exceeded on AWS EKS
We are experiencing the bellow error when attempting to create the flink job cluster using the sample provided in the repo. Our flink deployment with operator and job cluster works fine in Azure AKS but the bellow error occurs on AWS EKS.
Error from server (InternalError): error when creating "flink-on-k8s-operator-flink-operator-0.2.0/helm-chart/flink-job-cluster/flink-job-cluster.yaml": Internal error occurred: failed calling webhook "mflinkcluster.flinkoperator.k8s.io": Post https://flink-operator-webhook-service.flink-operator-system.svc:443/mutate-flinkoperator-k8s-io-v1beta1-flinkcluster?timeout=30s: context deadline exceeded
the job-cluster yaml:
apiVersion: flinkoperator.k8s.io/v1beta1
kind: FlinkCluster
metadata:
name: flink-job-cluster
labels:
app: flink-job-cluster
chart: flink-job-cluster-0.1.51
release: flink-job-cluster
spec:
image:
name: "flink:1.9.3"
envVars:
- name: HADOOP_CLASSPATH
value: /opt/flink/opt/flink-metrics-prometheus-1.9.3.jar
jobManager:
accessScope: Cluster
ports:
ui: 8081
extraPorts:
- containerPort: 9249
name: prom
resources:
limits:
cpu: 200m
memory: 1024Mi
podAnnotations:
fluentbit.io/parser: foo
taskManager:
replicas: 2
extraPorts:
- containerPort: 9249
name: prom
protocol: TCP
resources:
limits:
cpu: 200m
memory: 1024Mi
podAnnotations:
fluentbit.io/parser: foo
job:
jarFile: ./examples/streaming/WordCount.jar
className: org.apache.flink.streaming.examples.wordcount.WordCount
args: ["--input", "./README.txt"]
parallelism:
restartPolicy: Never
podAnnotations:
fluentbit.io/parser: foo
flinkProperties:
metrics.reporter.prom.class: org.apache.flink.metrics.prometheus.PrometheusReporter
taskmanager.numberOfTaskSlots: "1"
Using flink-operator 0.2.0 and EKS 1.17
Hi, Did you find a resolution to this issue?
Hello,
Maybe this issue have same origin than #399. See https://github.com/GoogleCloudPlatform/flink-on-k8s-operator/issues/399#issuecomment-1206193790.