flinkk8soperator
flinkk8soperator copied to clipboard
Update to FlinkApplication in ClusterStarting phase blocks transition to Running phase
When a FlinkApplication custom resource is added, the controller of the flink-operator creates job- and taskmanager deployments. These are labelled with a hash value flink-hash
, which is computed from the FlinkApplication, including all its annotations and labels.
During the lifetime of a FlinkApplication such annotations are sometimes added by other operators. A typical example is the helm-operator (https://github.com/fluxcd/helm-operator). When a FlinkApplication is created by the helm-operator on the basis of a HelmRelease referencing a Helm chart, an annotation helm.fluxcd.io/antecedent
is added to the FlinkApplication shortly after its creation.
If the FlinkApplication is already in its Running
phase, this leads to an update of the Flink cluster, i.e., the Flink cluster is recreated. This seems to be generally fine, but might be unnecessary when the change to the FlinkApplication does not change the properties of the Flink cluster itself.
However, when the update to the FlinkApplication happens while it is still in the ClusterStarting
phase, the hash value of the FlinkApplication changes due to the update. As a consequence, the deployments for jobmanager and taskmanagers can not be found as they are still labelled with the original hash value. Therefore, the method IsClusterReady of the controller always returns false, and the FlinkApplication never leaves the ClusterStarting
phase. See https://github.com/fluxcd/helm-operator/issues/243 .
Maybe an approach would be to compute the hash value not on the basis of the whole FlinkApplication resource, but from the values that actually should update the cluster.
A related problem is that all annotations of the FlinkApplication are propagated to the jobmanager and taskmanager deployments. For annotations like the one of the helm-operator mentioned above, this is not desirable as the annotation is used to identify resources that are explicitly managed by the helm-operator.