flink-on-k8s-operator icon indicating copy to clipboard operation
flink-on-k8s-operator copied to clipboard

Pod Affinity Feature Causing Flink Pipeline Redeployment to Fail

Open guruguha opened this issue 1 year ago • 2 comments

We have Flink Pipelines running in our production environment using spotify operator v0.4.2 release.

We wanted to upgrade to the latest release v0.5.0 which has added features of pod affinity. When we did this in lower environments, we saw that on flink pipeline redeploy, we get this error about HorizontalPodAutoscaler. Below is the error we see on the Flink Operator logs:

{"level":"error","ts":"2023-04-18T17:51:04Z","logger":"controllers.FlinkCluster","msg":"Failed to observe the current state","controller":"flinkcluster","controllerGroup":"flinkoperator.k8s.io","controllerKind":"FlinkCluster","FlinkCluster":{"name":"dataprep-v1","namespace":"flink-dataprep"},"namespace":"flink-dataprep","name":"dataprep-v1","reconcileID":"bfffad73-7557-4d9f-bc97-320fd42cc598","error":"no matches for kind \"HorizontalPodAutoscaler\" in version \"autoscaling/v2\"","stacktrace":"github.com/spotify/flink-on-k8s-operator/controllers/flinkcluster.(*FlinkClusterHandler).reconcile\n\t/workspace/controllers/flinkcluster/flinkcluster_controller.go:153\ngithub.com/spotify/flink-on-k8s-operator/controllers/flinkcluster.(*FlinkClusterReconciler).Reconcile\n\t/workspace/controllers/flinkcluster/flinkcluster_controller.go:97\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:122\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:323\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:274\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:235"}

We added the pod affinity to our cluster spec and started seeing this failure. We didn't have this in the previous operator version.

Looks like HorizontalPodAutoscaler autoscalingv2 expects EKS cluster version to be 1.22/23+. Can someone confirm this behavior? Our EKS cluster is on 1.21 which the release notes says is the min required version.

guruguha avatar Apr 18 '23 18:04 guruguha

Hi @guruguha!

iirc the autoscaling/v2 is only available on 1.23 and indeed the README has the wrong prerequisites.

regadas avatar Apr 28 '23 10:04 regadas

@regadas Thanks for confirming. Thanks for creating the PR as well.

guruguha avatar Apr 28 '23 16:04 guruguha