nifikop
nifikop copied to clipboard
[BUG] nifikop fails to scale down the nifi cluster when it misses the chance to set proper gracefulActionState
Bug Report
Similar to https://github.com/konpyutaika/nifikop/issues/79, nifikop might fail to scale down the nificluster if it misses the chance to set gracefulActionState to GracefulUpscaleSucceeded
for the nifi node to be deleted.
More concretely, we find that the GracefulActionState.State
(in the nificluster cr) for each nifi node (pod) typically goes through the following changes:
- it is set to
GracefulUpscaleRequired
insidereconcileNifiPod()
- it is set to
GracefulUpscaleRunning
insidehandlePodAddCCTask()
- it is set to
GracefulUpscaleSucceeded
insidereconcileNifiPod()
when the nifi pod is ready
Both 1 and 3 happens insidereconcileNifiPod()
, which is only invoked for each node in the Spec.Nodes
as shown below:
for _, node := range r.NifiCluster.Spec.Nodes {
...
o = r.pod(node.Id, nodeConfig, pvcs, log)
err = r.reconcileNifiPod(log, o.(*corev1.Pod))
if err != nil {
return err
}
}
Suppose a user first creates a nificluster with 2 nodes then scales down to 1 node. If the user updates the nificluster cr to remove the last nifi node from Spec.Nodes
between step 2 and step 3, GracefulActionState.State
of the last nifi pod will never be set to GracefulUpscaleSucceeded
and remains as GracefulUpscaleRunning
.
Since the GracefulActionState.State
of the nifi node is GracefulUpscaleRunning
, it will never be added to nodesPendingGracefulDownscale
by reconcileNifiPodDelete
, and the scale down will never happen.
What did you do? Scale down nifi cluster
What did you expect to see? The last nifi pod should be deleted successfully.
What did you see instead? Under which circumstances? The last nifi pod cannot be deleted and the scale down never happens.
Environment
- nifikop version: 1546e0242107bf2f2c1256db50f47c79956dd1c6
- go version: go1.13.9 linux/amd64
- Kubernetes version information: v1.18.9
Possible Solution
Maybe consider invoking reconcileNifiPod()
for each currently running nifi pod even if it is not in Spec.Nodes
right now
Additional context We are willing to help fix the bug. The bug is automatically found by our tool Sieve: https://github.com/sieve-project/sieve