flink-on-k8s-operator Pod Affinity Feature Causing Flink Pipeline Redeployment to Fail

We have Flink Pipelines running in our production environment using spotify operator v0.4.2 release.

We wanted to upgrade to the latest release v0.5.0 which has added features of pod affinity. When we did this in lower environments, we saw that on flink pipeline redeploy, we get this error about HorizontalPodAutoscaler. Below is the error we see on the Flink Operator logs:

{"level":"error","ts":"2023-04-18T17:51:04Z","logger":"controllers.FlinkCluster","msg":"Failed to observe the current state","controller":"flinkcluster","controllerGroup":"flinkoperator.k8s.io","controllerKind":"FlinkCluster","FlinkCluster":{"name":"dataprep-v1","namespace":"flink-dataprep"},"namespace":"flink-dataprep","name":"dataprep-v1","reconcileID":"bfffad73-7557-4d9f-bc97-320fd42cc598","error":"no matches for kind \"HorizontalPodAutoscaler\" in version \"autoscaling/v2\"","stacktrace":"github.com/spotify/flink-on-k8s-operator/controllers/flinkcluster.(*FlinkClusterHandler).reconcile\n\t/workspace/controllers/flinkcluster/flinkcluster_controller.go:153\ngithub.com/spotify/flink-on-k8s-operator/controllers/flinkcluster.(*FlinkClusterReconciler).Reconcile\n\t/workspace/controllers/flinkcluster/flinkcluster_controller.go:97\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:122\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:323\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:274\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:235"}

We added the pod affinity to our cluster spec and started seeing this failure. We didn't have this in the previous operator version.

Looks like HorizontalPodAutoscaler autoscalingv2 expects EKS cluster version to be 1.22/23+. Can someone confirm this behavior? Our EKS cluster is on 1.21 which the release notes says is the min required version.

Apr 18 '23 18:04 guruguha

Hi @guruguha!

iirc the autoscaling/v2 is only available on 1.23 and indeed the README has the wrong prerequisites.

Apr 28 '23 10:04 regadas

@regadas Thanks for confirming. Thanks for creating the PR as well.

Apr 28 '23 16:04 guruguha

flink-on-k8s-operator flink-on-k8s-operator copied to clipboard

Pod Affinity Feature Causing Flink Pipeline Redeployment to Fail

flink-on-k8s-operator
flink-on-k8s-operator copied to clipboard